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Thxc  study  is  dj.rected  tov;aT:-d  tlie  liapnoYement  of 
recoi;nltlon  techniques  by  developin^j  an  effective  set  of 
chapsctor-lulny  neasuronents  to  be  made  on  the  voice  signal. 
Specific  r;!oa3ure;;ients  aj'o  r:iado  on  cpeech  events  t’lat  have  been 
segaenbed  and  located  in  the  utterance.  In  this  studip  these 
EcgivejTts  are  located  manually.  The  select;bon  of  tliese  seg¬ 
ments  and  the  aspects  of  each  to  be  measured  are  guided  by 
acoustic  c’.rid  phonological  theory  and  ir-elations  of  vocal  tract 

dlci'jCIy  clilO., 


jsturos  to  speech.  The  P-ratio  of  the  anal^vhs 
oi’  variarice  3S  used  to  evaluate  the  speaker-  separcubing  ability 
of  a  mea surement ^  and  a  technique  is 
degree  of  dependence  bcti:een  pairs  of 


developed  to  evaluate 
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;erc  found 

damental  frequenepg  features  of  vov/el  and  nasal  spectra_, 
estimation  of  the  glottal  source  spectrum  slope ^  vrord  duratiouj 
fj/icative  spectrum  siiape ,  and  stop  consonant  prevoicing.  The 


QBveloprsent  of  these  moasurements  v:as 
of  3.  highly  flex:lble  digital  cosiputcr 
designed  for  on-line  speech  research. 
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INTRODUCTION 


The  Inforinatlon  contained  in  human  speech  includes  much 
more  than  the  vjords  of  the  language  that  the  speaker  intends 
to  transriiit .  .  ouperimposed  on  the  linguistic  corriponentj  there 
is-  a  socio-lingulstic  component ^  v;hich  can  tell  the  listener 
about  the  general  background  of  the  spealceig  and  a  pex'sonal 
component,  v.'hich  can  give  the  listener  inforination  about  the 
identity  of  the  speaker  (Ladefoged  and  Broaclbent,  1957). 
may  also  identify  an  emotional-expressive  component,  irhich 
reveals  the  emotional  state  of  the  speaker  and  his  feelings 
about  the  rues  3  B.  • 

Recognizing  the  person  frora  the  sound  of  his  voice  is 
a  comiinon  experieiice  for  anyone  vjho  uses  the  telephone  ox- 
listens  to  the  radio.  To  be  sure,  the  context  of  immedi.ate 
events  and  the  content  of  v.hiat  is  said  often  contribute 
strongly  to  the  identification,  but  such  recognition  also 
occurs  in  situations  v/here  thex-e  is  no  doubt  that  it  v;as 
triggered  by  the  acoustic  signal  alone.  This  ability  of 
hum.an  listeneius  has  been  conflriued  by  experiments  (Pollack, 
et  al . ,  195t;  Stevens,  et  al.,  1968). 

In  this  age  of  information  pi'-ocessing ,  the  question  of 
characterizing  and  recognizing  different  voices  is  naturally 
of  interest.  It  Is  conce:ivable  that  machine  (computer) 

mctliods  can  surpass  hu.i.ian  performance  by  vii'-tue  of  their 
capacity  for  data  storage  and  rapid,  dotal  led  analysis.  I'or 
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the  bu.eJ.nesy  v.'orldj  jiutora'’ tic  Gpcalcor-  recoj-^nition  could  open 
nev’  vistas  of  convenience  services  such  as  voice  iclentif IcU'- 
tion  to  supplant  the  credit  card  or  to  control  access  to  a 
facility  or  to  prlYi].ey;0d  inforniation.  It  could  also  find 
application  in  security  or  lav;  enforceiiient .  Furtherinore , 
research  on  automatic  speech  recognition  Ghov;s  that  differen¬ 
ces  in  speech  signals  due  to  different  speakers  greatly  in¬ 
crease  the  difficulty  of  the  recognition.  Better  understand¬ 
ing  of  these  speaker  differences  could  make  compensation  for 
them  possible  in  such  devices. 

Differences  in  voices  stem  from  tv;o  broad  bases;  organic 
and  learned  differences  (Garvin  and  Ladefogedj  lSo3).  Or-- 
ganic  differences  are  the  result  of  differences  in  the  ulzez 
and  shcipes  of  the  components  of  the  vocal  tract:  laryiir.; 
phcirynUj  tongue,  teeth,  and  the  oral  and  nasal  cavities. 

Since  the  resonances  of  the  vocal  tract  and  the  character¬ 
istics  of  the  sound  energy  sources  depend  on  lust  these  ana¬ 
tomical  factors,  these  differences  lead  to  differences  in 
fundamental  frequency,  laryngeal  source  spectrum,  and  formant 
frequencies  and  bandv/idths.  Learned  differences  are  the  re¬ 
sult  of  dif f ere.nccs  :in  tlie  patterns  of  coordinated  neural 
conuaands  to  the  separate  articulators  learned  by  each  indi¬ 
vidual.  Such  differences  give  rise  to  variations  in  the 
dynamics  of  the  vocal  tract  such  as  the  rate  of  formant  trans¬ 
itions  and  coarticulation  cffec;ts.  Naturally  many  speaker- 
dependent  propei-ties  are  affected  by  both  of  tlicse  factors. 
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llie  probler;!  o'J!  Kpoakor-  rccopnj.tlon^  like  most  problems 
in  pattern  recognition,,  may  be  consid.erocl  to  be  divided  into 
tv;o  par  ts  : '  iceasrreuient  and  classification.  In  the  fjrst 
partj  the  pattern  under  test  (a  voice  sl.gnal  from  an  unknovin 
speaker j  in  this  case)  Is  subjected  to  a  number  of  measure - 
raents.  resulting  In  a  set  of  nuiabers  v/hlch  (ideally)  character 
Ize  the  pattern.  These  values  In  turn  act  as  inputs  to  a 
classification  scheiiie,  v.iiich  compares  them  v/ith  stored  in- 
forTn.at,ion  on  knov'n  reference  patterns  and  makes  a  decision  as 
to  the  class  menibershlp  of  the  tested  pattern.  De^3cr iptions 
of  pattern  rococ;n:i  tion  v;ork  by  various  authors  place  different 
degpeevS  of  emphasis  on  those  tv/o  aspects j  v/ith  the  greater 
emphasis  usually  placed,  on  classification.  Spec  if  ically  ^  in 
speciker  recogn5  tion j  the  effort  spent  on  the  c.haracterizinf; 
rneasure.r;ent3  does  not  see.m  to  be  consistent  ’nith  either  the 
effort  spent  on  classification  procedures  or  v/lth  the  state 
of  our  understanding  of  the  v;ay  speech  is  produced, 

Uell  chosen  measurenet'its  are  Important  to  pattern  recog¬ 
nition  problems  in  several  respecits.  First  of  allj  they  must 
adequately  characterize  the  patterns  under  test.  No  amount 
of  decision-making  sophistication  can  coiarjenrsate  for  a.  basic 
lack  of  Information.  Furthor:aore .  the  araount  of  processing 
required  in  the  classification  phase  is  primarily  determined 
by  the  comple:c;i  ty  of  the  distributions  underlying  the  measure¬ 
ment  data.  For  example,  optlmu.m  classification  scheuies  of  the 
linear  type  \;ill  fail  if  tlie  classes  cannot  be  described  h'-: 
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cljs'olnt;,  convex^  connected  rollons.  It  f^hould  alco 

be  noted  that  the  raorc  sophisticated  cJ.assif ication  schernes 
effectively  estiraate  higher  order  properties  of  the  underly¬ 
ing  die tri’.butions  and  consequently  require  greater  quantities 
of  data  to  achieve  statistical  significance  (Sobestyen.  1962). 
A  v;ell  chosen  set  of  neasiu'-enients  siiould  perirtit  the  effective 
use  of  economical  decision  naking  procedures. 

In  the  case  of  raeasurements  viilch  are  adequately  repre- 
senta-tive  of;,  but  only  generally  related  to^  the  differences 
between  spcalcersj  it  is  effectively  left  to  the  classifica¬ 
tion  process  to  separate  the  spealcer-selective  effects  frora 
irrelevant  variations.  More  of  this  burden  should  be  shifted 
from  the  classification  process  to  the  measurement  process. 
This  meariuroment  phase  should  be  selective  and  efficient 
rather  tlian  merely  systeraatic  and  sufficient. 

The  aim  of  this  study  v/as  to  investigate  and  specify 
speaker-characterising  measurements  which  are  both  efficj.ent 
in  discriminating  speakers  and  amenable  to  automatic  measure¬ 
ment.  These  measurements  are  performed  only  on  selected 
* 

speech  segments j  rather  than  throughout  an  entire  utterance j 
and  eacli  measurement  is  tailored  to  j.ts  speecli  cegment.  The 
selection  of  these  speech  segments  and,  the  aspects  of  each 
to  be  measured  v/ci-e  i;iot;i_vated.  by  acoustic  and  phono].ogical 
theory  and  the  relations  of  vocal  tract. shapes  and  gestures 
to  speech.  The  development  of  those  measurements  v/as  facili¬ 
tated  by  the  use  of  a  highly  flexible  digital  computer  la- 
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CHAPTI'il  2 

REVIEU  OR  IMTJ'iRAa'URi:; 


In  order  to  evaluate  the  literature  on  automatic  nietnode 
of  eneahcr  recognition j  It  is  necessary  to  distinguish  the 
possible  tasks  a  speaker  recognition  systoin  nay  porforn.  Let 
the  ter\.i  speaker  recognition  refer  to  the  general  problem  of 
relating  a  voice  signal  to  the  person  v/ho  uttered  it.  In 
speaker  identification;,  the  task  is  to  classify  the  unknov/n 
voice  signal  as  belonging  to  one  of  ra  speakers  (closed  set 
paradlgiri)  or  as  belonging  to  one  of  ra  speakers  or  to  soiae 
person  oi'tsidc  that  set  (open  set  paradigai).  An  important  spe¬ 
cial  case  of  the  open  set  paradigra  is  that  in  vjiiich  rn  equals  Ij 
v;hich  is  called  ^epurer  verification  or  authentication . 

Since  past  v/orlcs  in  autoraatic  speaker  recognit?lon  differ 
not  only  in  the  forra  of  the  task^  but  also  in  the  size  of 
the  speaker  ensemble  and  in  the  restrictions  IrapoEed  on  the 
acoustic  signal^  comparison  of . the  effectiveness  of  the  GyEtems 
in  terms  of  error  rate  is  possible  only  in  a,  general  vray. 

In  the  present  reviev.g  trie  measurements  performed  in  these 
past  efforts  \/ill  be  emphasized . 

In  one  of  the  earliest  v.'orks  on  automatic  speaker  recog- 
nitiorij  Edie  and  Sebestyen  (l9o2)  proposed  the  autoraatic 
saiiipling  of  thJ.rteen  measurerii.Gnts  during  an  entire  utterance. 
Ihcse  iaeasurements  V;ei‘e  the  first  four  formants ^  patch  period j 
envelope  amplitude,  the  time  derivatives  of  all.  of  these,  and 
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a  paraiuetej:^  rc].:'l.eci  to  length  of  tho  current  voiced  interval. 
In  a  verification  ouperincnt ^  using  only  and 

and  pitch  period  StunpD.ed  at  5  points  0.2  seconds  apart,,  error 
rates  of  7  -  10yi>  v:ith  one  linov-n  and  four  unlcnoun  spealcers 
were  obtained.  In  an  .iclentif ica.tion  experiment j  F^  j  F^^  F^j 
and  Fij  v:ere  measured  for  11  speakers  and  an  average  error  of 
vja.s  obtained.  It  is  not  clear  v.'hether  the  training  and 
test  data  were  taken  from  the  same  linguistic  context, 

A  different  approach  to  measurements  was  described  by 
Pruzanslcy  (1963).  Intensity-frequency-time  patterns  '/jere  ob¬ 
tained  by  using  a  I7  channel  filter  bank  covering  200-7000 
Hz.  The  speech  data  \vere  single  conimon  words  extrorted  from 
context^  spolvon  by  10  male  and  female  speakers.  Reference 
patterns  for  each  talker  and  each  w'ord  v/ere  formed  by  averag¬ 
ing  three  repetitions,  and  a  separate  repetition  v;as  used  for 
testing.  Classification  v/as  performed  by  correlating  the 
unknov.'n  pattern  with  each  reference  pattern.  Each  v;orcl  v;as 
tested  separately,  and  an  average  error  of  11^  v-as  obtained. 

If  the  patterns  were  averaged,  over  time  so  that  only  frequency 
information  remained,  tiie  error  was  also  11^^.  Averaging 
over  frequency  Instead  resulted  in  a  much  higher  error. 

Pruzansky  and  lia thews  (196^[)  used  the  sarao  datai  set 
described  above,  but  considered  eticli  time -frequency  "cell" 
as  a  separate  measurement.  In  an  experiment  to  determine  the 
effect  of  the  size  of  the  cell  in  the  time  and  frequency 
diraensions ,  tiio  "qualitv-"  of  each  such,  col.l  for  speaker 


J 


in 


rccocnition  purpor^^os  v;as  evaluated  I'oi'  the  data  set  by  mcana 
of  the  F-aatio  of  the  analysis  of  variance.  This  statistic 
is  proportional  to  the  ratio  of  the  variance  of  the  speaker 
raeans  to  the  average  spea’ser  variance;  the  higher  this  value; 
the  rr>orc  distinguishable  are  the  inciivi.dual  speaker  distr-i'- 
butionsj  on  the  aver-cigc.  More  valll  be  .said  about  this  sta¬ 
tistic  in  Chapter  3.  nine  me  a  sure  me  nts  v/ith  the  highest  F- 
ratios  v;ere  used;  and  performance  leveled  off  drastically 
after  only  a.bout  10/i  of  the  total  number  of  raeasurement s  v/as 
used.  Furthermore;  perforraance  generally  Increased  as  the 
time  diraenslon  of  the  i.ndividual  cells  incres'.sed;  but  decreas 
ed  v.'hen  the  frequency  diraenslon  increased.  Tliis  confirms  the 
usefulness  of  the  f requency-dependent  information  found  in 
the  previous  study. 

Becker  j  et  al ,  (196^)  also  used  tills  s ingle -v.'ord  data 
in  evaluating  methods  of  summarization  and  classification. 

By  sur.iming  the  frequency  inforination  across  time  and  using  a 
non-Euclidean  distance  metric ;  they  obtained  an  average  iden- 


tifaration  erroi 


^or  of  3^1 

In  the  singl.e  word  matching  sc’iemes  just  desci'-ibedj 
tirae  registration  was  accomplished  by  ali.gning  the  maxiraura 
energy  points.  Differences  in  total  length  were  resolved 
simplj-  bjr  truncati.ng  to  the  length  of  the  shortest  example. 
It  is  thus  likely  that  corresponding  EcgiUents  in  other  parts 
of  the  vjord  were  not  in  exact  alignment  in  many  cases.  This 
i!’a.ct  laay  iiclp  to  eacplain  the  iwprovoraentr?  in  performance  as 


{ 


the  r:ieacurcm&ntG  v/ex-e  o,vcra^^oc!  over  auccea?;5.voly  longer  i-lrae 
iiitcrvala. 

Ti'iia  problci:!  v/aa  circuavented  jJi  the  autoniatic  speaker 
verification  exrpeirinient  by  Cai-bonellj  et  al.  (l9e[3)- 
moaeure.aents  in  thic  experiivient  v’cre  spectra  iron  a  13  channel 
filter  bank  covering  250-3000  Ha  j  sampled  akc  three  pointG 
in  the  v.'oi-d  "baseball."  The  neasurcnient  points  v.’ere  defined 
in  terr.u,-;  of  the  phoneraes  of  the  v;or-Q  so  that  they  v;ere  raade 
at  an  ecpaivalent  point  in  each  norcl  (e.g.j  l6  msec  after  the 
release  of  the  first  /b/) .  The  classif icatj.on  scheme  raeasur- 
ed  the  Kuclicean  distcince  betv/een  pairs  of  normalized  spec  bra 
in  a  13'-'di  mens  i  onal  space.  No  nuraerioal  results  u'ere  report- 
ecij  but  this  ineasurenient  and  simple  classification  scheme 
operating  on  the  three  measurement  points  v/as  described  as 
"encouraging. " 

The  automatic  speaker  verification  systcui  of  Meekerm  et 
al,  (1967)  also  uicide  its  measurements  during  certain  speci¬ 
fied  pliones.  An  autoisatio  speech  recognition  system  select¬ 


ed  occurrences  of  /ij  £j  ig  a/  from  continuous  spieech, 


The 


C 


tirce  dej/ivatlve  of  each  output  of  a  I9  channel  filter  bank 
vias  coarsely  measured  during  each  vov/al  selected^  and  essen¬ 
tially  avoraged  over  the  occurrences  of  that  voviel .  Clas¬ 
sification  vras  accoxinlished  by  essentially  calculating  the 
Euclidean  distance  betueon  tlie  four  averaged  voxels  and  the 
refoj^enco  data,  of  the  si:)eai-er  to  be  verified,  Using  11  male 
speakers  and  sotting  tlie  falr;e  dismissal  (injection  of  tlie 
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cox'j-'CC'L  L-pciaK'-Cj/ )  jjrooa'bii.'Mp'  to  Ipj  an  avorat^c  I'alaG  al,'ar-ia 
(accG[)tanoc  of -an  tncoio/ect  apeakoi=)  probability  of  -y/y  v;aG 
obtained . 

T}:ie  naaal  conoonant  /n/  v/as  the  focus  of  Jin  identifi¬ 
cation  achone  jr-epor-ted  by  Glenn  and  Kleiner  (l9o8).  Spectra 
taken  over  the  range  1.0-3. 5  kHn  during  tiic  Middle  of  /n/ 
in  initial  j  medial  ^  and  final  pooitions  and  averaged  ovei‘  10 
examples  uere  used  as  design  and  test  measuresientGj  and  clas¬ 
sification  v;as  accomplished  by  correlation.  Kith  a  population 
of  30  male  and  feraale  speakers j  an  error  rate  of  'J^jo  v;as  ob¬ 
tained.  bhen  the  unknov/n  data  v;as  averaged  over  fewer  utter¬ 
ances  j  the  average  error  rate  v/as  much  higher. 

In  a  recant  studpg  Atal  (1963)  lnv.es ti.ga ted  the  slgni- 
fi.cance  of  fundamental  freQuenc'.;  contours  for  speaker  recog¬ 
nition.  The  fundamental  frecpuency  vjas  accurately  measured 
during  repetitions  of  a  .sentence  of  about  2  seconds  duration 
by  10  female  sneakers.  The  pitch  contours  'were  smoothed  and 
time  scaled  to  the  same  length.  Then  a  reduction  of  diiuen- 
slonality  using  the  Karhunen-Loeve  transformation  and  a 
linear  clustering  transformati.on  produced  the  final  10-dimen¬ 
sional  data  vectors.  The  classif icati.o.n  procedure  (identi¬ 
fication)  used  the  Euclidean  distance  bet'.ieen  these  trans- 
for.med  vectors  and  obtai.ned  an  average  error  of  3>>.  Then 
sentence  duration  was  added  to  the  data  as  an  additional  di¬ 
mension;,  the  orror  v/a.s  reduced,  to  2p3  Theroforeg  as  one 
might  expect;  pitch  information  has  also  been  sho\7n  to  be  a 


ir 

relevant  laeaGurcriient  for  GpealvCr  recognition. 

Over  the  ehoi't  hictory  of  autoraatic  apealier  recognition j 
a  shift  can  be  soon  in  neaeureinent  strategy^  fro;a  general 
ineasureinenta  throughout  an  utterance  to  rtieasurementa  perfornn- 
eel  on  specific  speech  events.  Long-term  averages  have  not 
been  cornnonly  usecg  probably  because  there  is  much  in  the 
personal  component  of  the  speech  signal  that  is  inherently 
short-term.  The  use  of  ncisal  consonants  is  an  excellent  but 
isolated  example  of  selecting  a  measurement  location  especial¬ 
ly  for  its  effectiveness  in  characterizing  the  speaker  .  The 
measure.ments  descrJ.bed  here  do  encompass  aspects  of  the 
acoustic  signal  that  are  dependent  on  the  structural  and 
learned  cliaracterj.stics  of  individual  speakers^  but  they  do 
so  only  generally.  Lor  example ^  a  25  coraponent  spectrum  of 
/n/  does  real'lect  the  size  and  shape  of  the  spea’ner's  ncisal 
cavity^  but  the  formant  around  1  kHz  is  thought  to  be  closely 
tied  to  the  length  of  the  nasal  tract  and  v/ould  thus  cha¬ 
racterize  'the  speaker  much  more  directly.  Efforts  to  relate 
measurements  raore  directly  to  vocal  tract  structure  and  to 
specific  articulatory  events  should  result  in  Increased  ef¬ 
fectiveness  of  the  raeasurement  phase. 
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CHAFT!!^  3 

CHOOSIHG  I-A'IAAFnAi-iAMTS  FOI^  SPfiAivGF  RECOGNITION 


This  chapter  i/ill  be  a  discussion  of  c<^naral  principles 
of  Ciioooing  and  evaluating  neasurenients  for  speaker  recogni¬ 
tion.  includinii  queuititative  reasureiaent  oval.uations .  Hov/- 
evopj  the  matter  of  the  selection  of  the  speecii  events  on  which 
the  raoasurer.ient s  v;ere  made  v.’ill  be  deferred  until  Chapter  5* 

The  function  of  the  measurement  phase  of  a  speaker  re¬ 
cognition  system  is  to  perform  a  number  of  characterizing 
measurements  on  the  voice  pattern  under  test.  Simply  put^ 
the  speech  characteristics  measured  should  ideally: 

-  occur  naturally  and  freciuently  in  noriaal  speecli 

-  vary  as  niucli  as  possible  among  speakers^  but  be  as 
consistent  as  possible  for  a  given  speaker 

-  not  change  oveir-  time  or  be  affected  by  poor  health 

-  not  be  affected  by  reasonable  back.ground  nol.se  or  de¬ 
pend  on  specific  transraission  characteristics 

-  not  be  modifiable  by  conscious  effort  of  the  speaker^ 
or  at  least j  be  unlikely  to  be  affected,  by  attcriir)ts 
to  disguise  the  voice 

-  be  easily  measurable 


Some  of  these  consti-aints  can  be  relaxed  for  most  practical 
systems.,  but  it  is  good  to  l:eep  in  mlnci  tlie  most  generally 
useful  qualities  for  speaker  recognition  measurements . 
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3.1  tlpn 

One  class  of  niea surcnent  sciieines  that  has  oeen  used  in 
the  past  perfoT-XiS  a  set  of  raea,suror;ient s  at  10-20  msec  inter¬ 
vals  throupliout  an  entire  utterance  (Prusansl-'-ir,  19o3‘  Pru^ans- 
ky  and  Kathov/Sj  I96I' ;  Beckeip  et  al.^  19'S^'n  et  al.^  1966). 
I'here  are  throe  outstanding  difficulties  vnlth  this  c'pproa.ch. 

First;,  becciuse  of  the  norma.l  detailed  differences  in 
tiraing  of  each  utterance  ^  corresponding  articulatory  events 
do  not  occui'-  at  exactly  the  same  times,  even 'if  the  utteran¬ 
ces  are  regJ.s tered  at  a  particular  point,  such  as  the  begin¬ 
ning  or  the  energy  peak.  Therefore,  comparisons  of  the  mea¬ 
surements  at  .points  v;here  the  utterances  are  out  of  alignment 
are  betncen  sonevfaat  different  events.  It  may  be  argued, 
that  these  rnisalignmcnt,s  are  reflections  of  temporal  patterns 
associated  v;ith  learned  characteristics  of  different  speakers, 
Tnis  is  indeed  so,  but  in  this  form  the  temporal  variations 
interfere  v/ith  the  compai'-isons  03!*  similar  events.  Ue  need 
to  separate  these  effects,  taking  account  of  the  useful  tem¬ 
poral  patterns  while  also  making  comparisons  betv/een  similar 
art i c ul a t o ry  e vent s , 

Secondly,  regular  and  rapid  sampling  of  the  voj.ce  signal 
v/ith  the  characterising  measurements  produces  sots  of  data 
that  hxive  a  high  degree  of  redundancy.  Pieducl.ng  tPie  sampling 
rate  v.'ould  only  add  pr-oblems,  since  it  would  inc3/-ec'Se  the 
ch/u'ices  of  missing  sign:' f j.eant  speech  events. 

Finally,  a  given  set  of  measurements  is  not  optimally 
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su-itecl  to  ever-v  segment  of  cin  utterance,  .For  cxa.’uplCj  fun¬ 
damental  frequency  is  'rieaningi.ess  during  voiceless  intervals; 
lov;  and  mid-frequency  formants  have  jio  significance  during 
voiceless  fricatives. 

A  selective  c.nd  efficient  approach  to  raeasurement  is 
to  porJroria  so.me  degree  of  segmentatjon  and  recognition  of 
the  lin2;uistic  coraponent  of  the  speech  signal  before  the 
fiieasureraen'is  proper .  This  is  done  in  order  to  locate  certsiin 
speech  events  of  interest  and  then  to  make  appropriate  raea- 
sure.ments  at  each  of  these  points.  Similar  events  can  then 
be  compared  vnith  a  mini.mujii  of  interference  due  to  ti.ming 
differences.  Further.more j-  the  recognition  of  events  and 
boundaries  in  the  acoustic  representation  all.ous  the  . separate 
measurement  of  relevant  temporal  patterns. 

Segmentation  of  the  acoustic  signal  is  one  of  the  knot- 
ti.er  proble.ms  in  speech  recognitiorij  but  the  general  question 
need  not  concern  us.  In  tin^r  application  of  spealcer  vorif-i.- 
cation  and  probably  in  many  instances  of  spealver  identi^iica- 
tioHj  the  use  of  a  knovjn  linguistic  context  is  a  valid  as- 
suraption.  In  this  case,  the  necessary  segmentation  would 
not  be  difficult.  In  many  instances,  the  system  designer 
is  oven  free  to  specify  v;hat  utterance  the  Epen'cerc'S  must  say, 
so  he  may  tailor  the  utterance  both  to  contain  an  advantage¬ 
ous  sot  of  phonemes  smd.  to  bo  easily  segi.ionted . 
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Spinrad  (1963)  lias  stated  tv/o  cr;i.ter-:ln  that  a  possible 
character  rae  a  sure  merit  must  meet:  not  only  must  the  mea¬ 

surement  characterir.e  the  pfitterns  to  be  recosnlzecl;,  but  it 
must  also  be  able  to  be  pcriorriied  erfectively  and,  correctly. 
This  latter  point  is  not  just  idle  philosophy j,  but  a  question 
of  practical  signiricance .  For  example j  the  measurement  of 
formant  frequenej-es  in  cases  vjhere  the  foriaants  are  close ^  as 
in  /a/j  is  often  difficult.  It  is  possible  to  make  provision 
in  the  classif ica.tion  procedure  for  the  atypical  absence  of 
a  measurement,,  but  since  this  represents  a  loss  in  informa- 
tion^  in  general  it  is  desirable  for  such  absences  to  occur 
as  infrequently  as  possible. 

If  v;e  wish  to  use  these  two  criteria  to  evaluate  measure¬ 
ments  individually j  v;e  should  add  a  third  one:  independence 
of  the  measureraent  s ,  In  general  ^  we  seek  to  avoid  redundant 
measurements.  If  v/e  knov’  the  measurements  are  independent ^ 
then  v;e  know  that  all  of  the  measurements  are  contributing  to 
the  classification  process,  Effj.clency  of  representing  the 
speakers  means  that  the  required  processing  capacity  and  time 
is  rainimised.  Ihur thermore ^  j.ndependence  nieans  that  in  the 
classification  process j  the  measuroments  can  be  validly  con¬ 
sidered  separately  rather  than  jointly.  All  optimum  classi¬ 
fication  schemes  must  effectively  estimate  the  Joint  probabi¬ 
lity  distribution  over  the  iceasuroments  for  each  spealx'r.  If 
the  measurements  arc  indopondont,  the  Joint  distribution  is 


Blr.iply  the  product  of  the  indlv:lclua3-  d:l  stT-ibutions . 


Sinij.larly , 

nonopt;lnial  classification  scheiaes  raust  account  for  clepencloncios 
or-  suffer-  the  loss  in  per-foriuance  that  results  from  Isnoring 
them. 

In  selectin;^  nieasur-er.ient s  v;e  should  be  guided  by  acoustic 
and  phonological  theory  and  by  the  relations  of  vocal  tract 
shapes  and  gestures  to  speech.  There  are  several  such  consi¬ 
derations  v/hlch  have  direct  relevance  to  speaker  recognition. 

Measurements  v.iiich  relate  mainly  to  structural  differences 
should  do  so  as  directly  as  possible.  The  unique  vocal  tract 
of  each-  individual  is  a  fundamental  basis  of  speaker  recogni¬ 
tion.  Some  people  compare  the  structural  basis  of  speaker- 
j.dentification  to  that  of  fingerprint  Identification  (Kersta^ 
1962).  Since  the  relation  of  the  acoustic  signal  to  anatorciy 
is  much  less  direct  than  that  of  the  fingerprint ^  this  argument 
is  in  general  tenuous^  but  acoustic  measurements  which  do  find 
justification  in  terras  of  specific  anatomical  features  should 
not  only  be  effective  onesj  but  they  should  have  the  effect 
of  increasing  confidence  in  the  effectiveness  and  reliability 
of  speaker  recognition  in  general.  Purther-rnor-e ^  the  sources 
of  variation  are  minimized  if  a  measurement  is  r-elsi.ted  to  a 
specific  anatomical  feature  rather  than  to  anatoray  In  only  a 
general  v;ay. 

The  voccil  tj:-act  displays  different  character istios  during 
different  speech  sounds.  Rather  than  lualro  general  measure- 
mentSj  such  as  formant  frequencies  and.  their  time  dorivatj.ves ^ 
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on  every  speech  sef^iaentj  v;e  shotild  tailor  the  me  a  sere  merits  to 
the  specific  speech  event  being;  measured, 

0?here  are  cei'tain  acoustic  correlates  of  the  distinctive 
features  in  a  given  language  that  are  signifiicant  in  the  pro- 
duction  and  perception  of  those  features;  they  carry  the  lin¬ 
guistic  information  ( Stevens in  press).  Other  acoustic  at¬ 
tributes  of  the  signal  are  then  extra-linguistic*  they  do  not 
enter  into  the  process  of  transmitting  the  message.  It  is 
among  the  extra-linguistic  attributes  that  we  should  loolc  for 
possible  speaker-selective  characteristics.  For  example j  in 
the  vowel  /a./ ,  the  essential  acousti.c  feature  is  the  close¬ 
ness  of  the  first  and  second  formants j  resulting  in  a  broad 
concentration  of  energy  in  the  neighborhood  of  1  kHz,  The  de¬ 
tails  of  the  relationship  j  such  as  relative  aniplii.tude 

of  the  spectrum  peakSj  absolute  frequencies,  or,  vjlthin  limits, 


separation  of  the  pesiks,  are  probably  not  Important  for  the 
perception  of  /a/,  as  evidenced  by  the  fact  that  the  product¬ 
ions  of  different  individuals  differ  in  these  respects. 

The  phonology  of  the  language  imposes  constraints  on  the 
combinations  of  phonemes  which  taay  form  words  of  that  language. 
If  the  presence  of  one  phonerao  constrains  the  possible  pho¬ 
nemes  w'hich  may  follow;  it,  then  certain  rules  for  the  product¬ 
ion  of  that  phonerr;e  may  be  optional)  their  function  has  already 
been  performed  by  the  constraints  imposed  by  the  first  phoneme. 
For  example,  in  English,  if  a  stop  follow/s  a  nasal  in  a  final 


consonant  cluster,  j.t  must  iiave  the 


same  place  of  rcrticulat  ion 


2H 


as  the  nasa'l.  V/ords  like  buiup  and  bunt  occui"'  in  Knglish^  but 
burnt  is  not  allov/ccl.  Jicnce  a  spealrei^  does  not  have  to  be 
procise  in  his  articulation  of  that  stop.  Variations  in  that 
articulation  due  to  different  speakers  v/ould  be  extra-linsuis- 
t  ic , 


Certain  articula tory  features  or  feature  sets  are  not 
used  for  phonetic  distinctions  in  some  languages.  These  too 
are  attributes  nhich  may  vary  betv/een  speakers.  For  example^ 
in  English^  voicing  during  closure  of  a  voiced  stop  Is  op¬ 


tional. 


In  addition  to  the  general  requirement  that  the  measure¬ 
ments  be  efficient^  thei'-e  may  also  be  requirements  imposed  by 
the  specific  i.mplementation  of  the  speaker  recognition  sj^stem. 
In  speaker  verification,  it  riieiy  be  assumed  that  the  speaker 
is  cooperative,  since  he  vrishes  to  be  identified.  (The  pos¬ 
sibility  of  mimicry  ha.s  not  been  extensively  studied,  but  it 
vsxll  eventually  have  to  be  considered.)  In  speaker  identifi¬ 
cation,  hovjever,  v.:e  may  not  be  able  to  assume  that  the  speaker 

is  not  attempting  to  disguise  his  voice.  This  vrould  mainly 

0 

affect  measurements  vjhlch  are  derived  from,  learned  character¬ 
istics,  such  as  dialect  and  intonation.  It  is  also  possible 
that  sorae  structurcLl  characteristics  would  be  modified, 
through  a  distortion  of  the  vocal  tract  such  as  rounding  the 


1-ips  or  placing  objects 
tivwart  voice  disypilscs 
a  c  o  V.  :  •  t  i  c  c  h  a  r  a  c  t  e  r  1  s  t  i  c 


in  the  mouth.  A  serious  effort  to 
v/ould  have  to  include  a  study  of  the 
s  that  -would  bo  affected. 


25 


The  frcQUonoy  cliaracterils tlce  of  tlic  tranoialaeion  oyotor,; 
that  carries  the  voice  sJ.gnol  iviay  affect  the  choice  of  mea- 
sureniontc .  The  telephone,  for  example ,  has  a  bandv'iclth  re- 
stjxl.cted  i^oughly  to  the  ranye  of  250-3000  Hs ,  so  hlsh-frequon- 
cy  spectral  ciiaractoristics  are  not  present  and  direct  mea- 
sureraent  of  fundamental  freqviency  is  not  possible.  (Of  course 
it  is  possible  to  reconstruct  the  fundamental  by  suitable 
procesBing.  )  In  general,  temporal  patterns  and  raeasureraents 
v;ith  the  diraension  of  frequency,  such  as  fundamental  and 
for-iacint  frequencies,  v.’ould  be  imdlstorted,  but  some  sort  of 
compensation  for  the  effect  of  transmissj.on  characteristics 
V70uld.  be  required  for  measurements  of  spectruai  amplitudes. 


3.3  Evaluating  candidate  me a sure men ts 

Given  a  number  of  possible  measurements  v/hich  have  been 
selected  viith  as  much  attention  as  possible  to  a  priori  con¬ 
siderations  such  as  those  outlined  above,  hovj  can  v;e  evaluate 


the  suitability  of  each  measurement  to  the  speak 


er  recosni 


tion  problem,  in  order  to  knov;  v;hlch  ones  to  keep  and  which 

« 

ones  to  discard?  Unfortunately,  there  is  no  objective  v;ay  of 
evaluating  a  mecisurement  by  itself.  The  ultiraate  utility  of 
a  measurement  depends  upon  the  nature  of  the  classification 
system  that  follows  it.  Only  after  a  classifier  has  been 


coupled  to  the  measurement  system  can  such  menningful 


measure; 


as  error  rate  and  distribution  of  errors  be  used.  However, 
given  the  results  of  a.  measurement  performed  on  multiple  re 


pet:it:i.on3  of  an  utterance  by  each  of  a  suitably  chosen  set 
of  speakers j  v;e  can  evaluate  certain  gonoral  but  useful  pro¬ 
perties  related  to  the  capability  for  separating  speadners 
and  to  the  extent  of  interraeasurement  dependence. 


3 , 3  ,_1  Ability  to  separate  speakers 

The  raeasurement  data  for  each  individual  speaker  may  be 
regarded  as  samples  frora  a  clistribut j.on  associated.  v;ith  that 
speaker.  The  individual  speaker  distributions  of  an  ideally 
effective  measurement  v/ould.be  disjoint.  In  practice j  they 
are  not  disjoint j  but  it  Is  desirable  that  they  be  as  narron 
and  as  v.'idely  separated  as  possible j  in  order  that  the  test 
value  of  a  measurement  be  associated  with  as  few  speaker  dis¬ 
tributions  as  possible.  An  intuitive  measure  of  this  condi¬ 
tion  is  the  average  relative  variance v/hich  may  be  defined 
as  the  ratio  of  the  average  individual ' speaker  sample  variance 
to  the  total  population  sample  variance.  If  this  measure  is 
loWj  then  on  th.e  average j  the  indlvj.dual  distributions  are 
narro'.'/  v/ith  respect  to  the  distribution  of  the  population. 

A  similar  statistic  v/liich  has  been  found  useful  by  pre¬ 
vious  invest. i.gators  is  the  P-ratio  of  the  analysis  of  va¬ 
riance  (Pruzansky  and  Mathews^  1964 ^  DaSj  I969).  For  the 
case  where  the  number  of  measurements  Is  the  same  for  each 
spea’cerg  and  equal  to  n^  the  F-ratio  is  given  by: 

p  -  speaker  raejins) 

(Average  of  speaker  var'iance’s } 
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(Since  this  statistic  depends  on  a  more  senc:i:-al  statistic 
v.'Ouid  be  F/n,  bat  in  the  present  study^  n  will  alvjays  be  equal 
to  lOj  so  this  nor-iaalination  need  not  concern  us.)  The  value 
of  P  increases  as  the  individual  distributions  spread  farther 
apart  and  as  they  becorae  narrov/er.  The  F-ratio  has  the  de¬ 
sirable  property  of  invariance  to  translation  cind  scaling.  It 
is  shovm  in  Appendix  I  that  ranking  measurenients  according  to 
descending  values  of  F  is  equivalent  to  ranking  them  accord¬ 
ing  to  ascending  average  relative  variances.  Although  the  use 
of  P  is  intuitively  appealing^  it  should  be  pointed  out  that 
it  is  not  optimal  in  the  sense  of  minimizing  any  error  proba- 
bility_,  and.  it  takes  no  account  of  possible  dependencies  be- 
tv/een  measuar-ernents . 

Tv;o  other  statistics  have  been  proposed  regarding  the 
capability  of  a  raeasureiaent  to  separate  classes.  Divergence 
(Marill  and  Green^  19e3)  and  mutual  Infoauriation  (Lewis,  1962; 
Kamentsky  and  L.iu.,  I963)  both  require  the  estimation  of  the 
underlying  distributions.  Since  in  this  study  there  a-aere 
oiily  10  repetitions  of  each  measurement  for  each  speaker, 
these  statistics  did  not  seem  readily  applicable. 


^.  3..^_ _ I r, t  G  r  me  a.  s  ure me  n t  d e p c nd c .iice^ 

Mucia  interfile  a  sure  menu  redundancy  can  be  avoided  by  in¬ 
telligent  choices  of  measurements.  If  a  spectrum  is  measured 
in  tiio  center  of  a  tense  vowel. ,  another  spcctruia  10  or  20  msec 
later  V;ill  probably  yield  little  new  inforr.ia tlon .  Hoviever,  if 
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the  !.\eacii;fe;.iCnts  concerned  ai/o  loss  clearly  related  (e.s.j 
energies  In  certain  spectreu'.i  ir^nses  in  certain  plionenie s )  j  the 
intuitive  approach  becoues  inad.equato .  Statistical  tests  of 
independence  that  v;ould  apply  to  this  situation  do.  not  appear 
to  bo  easy  to  find.  In  the  present  study^a  procedure  has 
been  develop-ed  v.'hich  is  related  to  this  problenn  but  not  in 
a  strictly  quantitative  v/ay.  This  procedure  grov;  out  of  an¬ 
other  inquiry  into  the  separ-ability  of  classes  j  nhiich  vfill 
require  explanation. 

Tlie  results  of  a  single  ueasurcraent  on  a  set  of  speakers 
iTiay  be  represented  pictorially  as  shov.'n  in  Fig.  la.  The  n 
repetitions  for  eachi  speaker  are  plotted  in  a  horifsonta.l  linO;, 
and  the  data  for  each  of  the  m  speuiliers  is  plotted  on  a  dif¬ 
ferent  line.  If  the  nieasuresient  is  a  good  one  for  speaker 
recognition^  the  individual  speaker  data  v/ill  be  clustered 
and  the  cliisters  v;lll  be  separated,  from  one  another,  A  datura  ' 
is  tcraied  discr iminable  froni  another  speaker  if  it  lies  out¬ 
side  the  range  of  t,hat  spea'.cer  as  deteriiiined  by  th_e  data  of 
that  speaker.  For  exaraple^,  in  Pig.  la,  the  rightmost  datum 
of  spcalrer  a  is  discrirainable  from  speakers  b  and  d,  but  not 
from  speaker  c.  Only  coraparisonri .  v.ulth  other  speakers  are 
considered.  If  th.i.s  coraparison  is  perforraed  foi'  eeich  of  the 
ran  data  against  the  ni-l  other  spealcers,  an  average  measure  of 
range  exclus'oji  discr i rainat ion  is  gi.ven  by 

P  ^  „_cl _ 

red  nia(ri!-ij  ^ 

v;here  d  i.s  the;  total  number  of  such  datum-spealrcr  discrimina- 


S 
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Measurerrisut  value 


Figure  la.  Hypothetical  measurement  data  on  n  repetitions  by  ee.ch  of 
m  speakers.  Each  speaker’s  data  is  plotted  on  a  separate- 
horiKontal  line. 


MoasurGinent  #1 


Figure  lb.  Range  exclusion  discrimination  using  two  ineasuremonts . 

The  box  delineates  the  rcnigos  of  Speaker  a  data  in 
both  d Imens ions , 
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tilons  .  This  quantity  is 
may  be  regarded  ns  an  ost 
discriiuination  of  one  spe 


a  relative  frequency;  tliereforc  it 
itaato  of  the  average  probability  of 
nicer  from  all  other  speakers  in  the 


set  j 


using  this  measurement j 


and  according  to  the  range  ex¬ 


clusion  criterion 
good  estimate  of 


stated  above.  This  is  admittedly  not  a 
the  discriraination  capab.ility  of  the  measure 


rnent  on  the  set  of  speak.ers;,  nor  is  it  seriously  intended  to 
be.  It  measures  only  the  extent  of  overlap  of  the  ranges  of 
the  individual  speaker  data.  Roughly  speaking;,  if  the  mea- 
sureraont  is  a  poor  one  for  speaker  recognition.,  the  ranges' 
will  overlap;  if  it  is  a  better  one^  they  vnlll  tend  to  over¬ 
lap  less. 

This  procedure  may  bo  extended  to  the  case  of  two  or 
more  measureraents j  considered  jointly.  In  this  case^  a  datum 
(of  tv;0  or  tnore  components)  is  termed  dlscriminable  frora 
another  speaker  if  it  lies  outside  the  range  of  that  speaker' 
data  in  one  or  more  dimensions.  For  example.,  in  Fig.  lb, 
data  bj  and  d  are  dlscriminable  from  speaker  8-j  but  data 
e  and  f  are  not. 


Note  that  the  total  proportion  of  datum-speaker  dlscrl- 
rnina.tions  is  the  same  if  the  decisions  are  made  considering 
all  measurements  jointly ^  as  above ^  or  if  the  discriraination 
decisions  are  made  by  first  applying  this  procedure  using  one 
measurement  and  then  applying  it  to  the  da  turn-speaker  pruirs 
not  already  found  disciciminable ,  usin2;  tiio  next  measui/ement , 
and  so  on.  The  effect  of  appj.yjng  this  r.e.d,  procedure  v;ith 
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the  firot  me  a  SLIP  or, lent  may  be  regarded  as  rernoving  a  proportion 
of  the  datum-speaker  pairs  I'i'rorn  further  consiclerationj 
since  tliey  have  been  found  to  be  cliscriniinable ,  The  sequen¬ 
tial  application  of  the  r.e.d.  pj/oceclujfe  v.'ith  the  second  nieai- 
su.rcnient  effectively  operates  on  the  remaining  proportion 
(l  -  ) .  After  the  second  application^  a  proportion  of 

the  total  datura-spealcer  pairs  has  been  found  to  be  discriniin- 
able.  If  probabilities j  rather  than  probability  estimateSj 
viere  being  used^  and  the  tuo  measurements  viere  independentj 
then : 


^12  =  --  ^2 


(1  -  P 


12) 


(1  -  P J  (1  -  Pg) 


he  may  expect  that  the  estimated  probabilities  may  approximate 
this  relationship j  or  that  the  statistic 

1  -  P. 


AP 


12 


(1-P^)(1-P2) 


1 


will  be  close  to  zero  for  the  case  of  Independent  measurements 
In  factj  as  ivlll  be  shov/n  In  Chapter  this  statistic  is 


small  for  measurement  pairs  that  may  be  intuitively  called 
independent  and  large  for  obviously  dependent  pairs. 


Unf  o  r  t  un  a  t  e  ].  y 


the  distribution  of  this 


statistic  under 


the  hypothesis  of 


independence  is  not  known j  so  it  is  not 


possible  to  as 
to  this  test, 
nitron j  it  may 
me  a  s  in:'  e  ri  le  n  t  , 


sign  a  critical  region  and  significance  level 
Pbr  the  purposes  of  pragmatic  pattern  recog- 
not  bo  necessary  to  have  strictly  li.ndependent 
It  mai^  suffice  to  use  measuresnents  that  arc 
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merely  not  stronyjly  dependent^  for  v.-hich  purpose  the  AP-test 
v;ith  some  thi-eshold  or  the  lov.’est  combination  of  values  of 
AP  for  pair\:isG  comparisons  of  mcasureraents  is  a  useful 
te^chnique. 


33 


CHAPTER  ^i- 


IffiCOjl]:)XNG  ANi3  FiiOCESoING  OF  DATA 


_  Scope the  experiment 

The  speech  data  v/as  taken  from  21  adult  malCj  American 
speakers j  ranging  In  age  from  22  to  42  years.  None  had  a  no¬ 
ticeable  speech  defect.  Regional  accent  v/as  not  closely 
controlled j  tv;o  speakers  had  mild  southern  accents.  All 
speakers  v;ere  staff  or  students  at  the  Massachusetts  Insti¬ 
tute  of  Technology.  They  ^'/ere  apprised  of  the  nature  of  the 
experiment  and  v;ere  accordingly  asked  to  speak  norraally. 

Ten  repetitions  of  six  short  sentences  were  recorded  from 
each  speaker. 

The  text  of  the  speech  data  v/as  specified  by  the  experi¬ 
menter,  This  is  the  usual  condition  in  any  speaker  verifi¬ 
cation  paradigm,  and  It  may  not  be  unreasonable  in  the  case 
of  identification.  This  v;as  necessary  because  the  acoustic 
measurements  were  to  be  perforraed  on  specific  segments  of  .  . 

the  utterances. 


The  speech  data  was  recorded  in  a  single  session.  Only 
speakers  v/ho  were  reasonably  free  from  colds  or  other  inflam¬ 
mations  v;ere  used.  The  stfibility  of  the  measurements  with 
respect  to  time  or  to  the  state  of  health  of  the  speaker  v/as 
not  inve  s t iga ted . 

Tiie  speech  w’as  recorded  under  lov;  noise  conditions  w3.th 
hi.gh  quality,  wide  bandwidth  cquipaient.  The  effects  on  the 
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measurenientf:;  of  a  reduction  J.n  cj.grial  to  noise  rcitio^  band- 
v.'ldthj  or-  other-  condition  of  fidelity  v..-as  not  investigated. 

Since  tiie  construction  or  simulation  of  a  complete  auto¬ 
matic  speaker-  recognition  system  vias  not  the  aim  of  this 
study,  the  locations  in  the  utterances  vdier-e  measure  men  to  v;cr-e 
made  v-ere  determined  manually.  They  v;ere  determined  system- 
aticallyp  hov'everg  in  vie.ys  that  v/er-e  felt  to  be  a;nenable  to 
auto'matj.c  iraplementation  by  simple  coraputer-  pr-ograras , 


.S  Devising  test  utterances 

Devising  the  test  utterances  really  cannot  be  separated 
from  selecting  the  measurements ^  since  the  utterances  are 
the  vehicles  for  providing  the  speech  events  on  vjhich  the 


measurements  are  made.  The  raeasurement s  that  v/er-e  investi¬ 
gated  v/ill  be  described  in  detail  in  Chapter-  5.  Aside  from 
the  matter  of  the  specific  speech  segments  to  be  .included^ 
there  are  general  considerations  v/hich  .must  also  enter-  into 
the  process  of  raaking  up  the  utterances. 


Prior  to  the  .main  experiment some  informal  investri.ga- 
tlon  v;as  done  on  the  sentence^  "She  remenibers  rcie/'  spoken 
by  10  people.  As  a  result  of  this  v;or-k  and  some  use  of  the 
raicr-ophone  input  to  the  SPADEy  coraputer  conf i.gur-a tlcm  (see 
section  number  of  trial  hypotheses  were  made  about 


speech  events  tha 
learned  about  the 
to  specific  i^tcnis 


t  should  be  included;,  and  so.mo thing  v/as 
rudiments  of  segmentation.  In  addition 
to  be  investigated,  such  as  fundameMtal 


na:jal  cori- 


frcqucncy  j.n  stresGG.d  and  nnstrcsaed  poaitionaj 
sonantoj  and  certain  vov/elSj  it  was  desired  to  include  a.  v/lde 
variety  of  vov.’elSj  diplithonySj  fricatives and  stops  for 
possibl.e  investigation. 

In  this  exploratory  situritlon  and  also  in  practical 
situations,  at  least  several  seconds  of  speech  data  are  re¬ 
quired  in  order  to  provide  sufficient  number  and  variety  of 
speech  events  for  ineasureraent .  One  or  raore  sentences  ai^e 
preferable  to  Isolated  v/ords  or  phrases.  Granunatical  senten¬ 
ces  provide  the  spealcer  with  a  standard  and  hopefully  unam¬ 
biguous  interpretation  and  hence  a  v/ell  defined  pronou.ncla- 
tion . 

Once  the  specific  speech  segments  to  be  raoasured  have 
been  selected,  the  task  of  incorporating  thera  into  a  su.i-table 
sentence  can  be  a  frustrating  one.  The  speech  segments  should 
be  placed  in  favorable  environments  in  the  sentence,  and  the 
sentence  should  be  easy  to  segment,  natural  to  say,  and  usual¬ 


ly  spoken  in  just  one  v;ay.  It  is  no  v\'ondGr  that  such  senten¬ 
ces  sometimes  end  up  appearing  rather  contrived  I 

t 

In  a  declarative  sentence,  the  speaker  normally  lets 
his  pitch  and  amplitude  fall  at  the  end  of  the  sentence. 
Intra- speaker  variability  is  probably  increased  at  this  time, 
for  the  pitch  periods  often  become  irregular,  and  there  is 
a  tendency  to  accompany  the  drop  in  voice  level  v/ith  less 
precise  articulation.  For  this  reason,  the  final  syllable 
is  generally  not  suitable  for  lacasurement .  If  a  steady-state 
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vov;c:l  iiieaKurcriieni;  is  desired^  io^  one  pr-lniarni ly  influenc¬ 

ed.  by  vocal  tract  structure  a,nd  articulatory  posltii.on  rather 
than  by  dynamic the  vov;el  should  be  put  in  a  context  v;here 
the  foi'.niant  targets  or  steady-state  positions  are  likely  to 
be  reached,  Vov/els  should  not  be  put  in  uords  in  positions 
viher-e  they  are  reduced.  If  a  vov/el  is  stressed^  it  Is  gene¬ 
rally  lengthened.  It  has  also  been  found  that  a  vouel  is 
lengthened,  before  a  voiced  consonant,  and  that  vov/cl  forraant 
targets  are  more  closely  approached  if  the  consonantal  con¬ 
text  is  a  stop  rather  than  a  fricative,  even  though  the  dure.- 


tion  is  shorter  (Stevens  and  House,  1963).  Nasal  consonants 
v’hlch  s.rG  inherently  lovj  in  intensity  and  are  often  short, 
are  cleai'cst  and  loudest  v.dien  they  precede  a  stressed.  vov;el . 

The  utterance  should  be  designed  v/ith  an  eye  to  the 
segmentation  and  recognition  that  viill  be  required  in  order 
to  locate  the  raeasurenient s ,  For  example,  the  sentence,  "How 
are  you?"  v;ould  be  much  harder  to  segment  than  "I  sav;  Tom," 


because  of  the  lack  of  voiceless  segraents  and  stops.  Stops 
and  strident  fricatives  are  useful  landmarks  for  cueing  t?ie 
segiiientation  of  the  sentence ,  but  they  cannot  be  sprinkled 
in  too  liberally,  or  the  sentence  becomes  difficult  and  un¬ 


natural  to  say.  Since  v;e  rel.y  on  the  speaker  using  his  ov/n, 
v/ell-establishod  speech  gestures,  i;e  v/ish  to  minimize  un- 
natu.ralncss ,  The  pr-obJ.em  of  recognizing  th.e  beginning  of  an 
utterance  can  be  min.i.iiiizod  if  the  utterance  begins  with  a 
stop  or  a  vov/el,  Xnitj.al  fricatives  and.  nasals 


should  be 
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avoided . 


Li, 

pr-oiuote 
is  ( 
tent ion 


et  al .  (1966)  einployscl  an  interesting  device  to 
naturalness.  They  used  short  phravses  like,  "My  nanie 
) .  "  They  reasoned  that  the  speaker  vjould  focus  at- 
on  saying  his  ov;n  name,  and  hence  the  first  tvjo  v/ords. 


Vfhlch  v-ere  actually  used  for  measurement,  v/ould  be  free  of 
und ue  c  rapha sis. 


Some  v;ords  in  English  have  more  than  one  acceptable 
pronounciation,  and  Individuals  are  not  necessarily  consj. st¬ 


ent  in  the  version  they  use,  particularly  if  they  have  lived 
in  different  regions  and  have  been  Influenced  by  different 
dialects.  Common  examples  of  such  words  are  a,  either,  and 
aunt .  Stress  can  be  similarly  affected,  as  in  dovjntown . 

It  is  desirable  to  avoid  at  least  the  most  commion  of  these 
words,  in  order  that  the  performance  of  the  system  not  de¬ 
pend  on  the  speaker's  reiaeraberlng  a  standard  version  of  the 
utterance.  Naturally,  the  sentence  itself  should  be  unam¬ 
biguous,  since  the  syntax  affects  the  pronounciation. 

For  the  purposes  of  this  experiment,  the  six  short 
sentences  given  below  were  devised.  This,  task  vras  not  an 
easy  one,  and  some  of  the  considerations  mentioned  above 
viero  occasionally  corapromised ,  The  linguistic  content  of 
these  sentences  is  certainly  beside  the  point.  The  numbers 
associated  v.ultii  the  scntencer;  will  be  used  to  refer  to  them 
later. 

Cool  shirts  please  me. 


1. 


3.  X  cannoo  r-orneiiroor  it. 

H,  Papa  needs  tv.'o  sin^ens. 

5.  A  fevj  boys  bought  them. 

6.  Cash  this  bond^  please. 


Even  though  no  autoraa/c..lc 


sesnientation  v.’as  contemplated 


for  thj.s  ejiperiniont^  each  sentence  begins  v;lth  a  stop  or  a 
voviol .  The  first  v/ord  in  sentence  5  ^''as  occasionally,  so  brief 
that  utterance  initiation  loyic  might  have  missed  it.  The 
last  syllable  of  each  sentence  is  intended  as  a  "filler" 
because  of  the  voice  drop  mentioned  above.  It  v;as  founds 
hov/everg  that  the  use  of  the  v/ord  please  on  the  end  also  Veept 


the  voice  level  hmigh  at  the  end  of  the  sentence. 

The  sentences  v:ere  designed  to  include  a  variety  of 
speech  sounciSj  not  all  of  v.iilch  vrere  investigated  in  this 


expel"- i nie nt .  Some  of  the  principal  ones  are  pointed  out  be- 
lov;^  as  are  sorae  of  the  shortcomings. 

1.  Cool  v;as  specifically  used,  to  get  a  good 
example  of  /u/.  In  many  words,,  /u/  is  not 
fully  articulated.  Sentence  1  also  contains 
the  fricatives  /s/_,  and  /z/  and  the 


vowels  /3y  and  /!/. 


2.  Th.i-S  sentence  contains  an  example  of  the 
diphthongim.ed  vov.'cl  /e7./  and  also  /a/  and  /y/. 
The  /a?/  in  man  may 


be  nasaliJ-^ed. 
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3.  Th;L>?.  sentence  contains  the  diphtliorj^;  /ai/^ 

and  nasals  /n/  and  /m/  in  pro  stressed  positions. 

It  turns  out  that  there  are  tx'/o  acceptable  stress 
markinp-^s  for  cannoj;^  and  care  had  to  be  taken 
in  recording  speakers  to  insure  that  the  second 
syJ.lable  v/as  stressed.  The  vov.'els  /a/  and.  /s/ 
may  be  infl.uenced  b^/  the  nasals. 

^1,  This  sentence  contains  /a/^  /i/j  and.  an'/u/ 
that  is  too  short  to  be  fully  articulated.  The 
/n/  is  in  prestressed,  position^  but  the  /^/  is 
not  favorably  located  for  purposes  of  automatic 
segmentation  and  location. 

5.  This  sentence  contains  examples  of  diphthongs 
/W/  and.  /oj  /  and  the  vov;el  /O/. 

6.  This  sentence  contains  /se/j  /S/^  /l/,  and 
/s/.  The  /a/  mc'.y  be  nasalized.  The  location  of 
the  voiced,  stop  /"o/  follovjing  a  voiceless  sound 
turns  out  to  be  useful  (see  section  5.6). 

Obviously^  the  sentences  also  contain  other  speech  sounds. 

The  stops  and  strident  fricatives  are  useful  for  scgmentatiorij 
as  are  the  nasals  v.'hen  they  occur  between  vov;els. 

^^.3  Recording  procedure  -  , 

The  recordings  of  speech  data  were  made  in  the  Research 
Laborator-v  o;r  Electronic  a  anechoic  chamber  using  an  Altec 
68 A  dynamic  raicrophone  .himg  by  a  cord  from  the  top  of  the 
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chaiabor.  The  riilcrophono  i/aa  positioned  10  inches  troni  and 
2.  Inches  above  the  lips  of  the  spealcei-.  The  speech  silgnals 
i.vcre  recorded  on  a  Presto  800  tape  deck  at  7*5  inches  per 


second,  line  recording  apparatus  v;as  located  in  a  nearby 
studio.  Ten  repetitions  of  the  six  short  sentences  listed  in 
the  previous  section  v;ero  recorded,  froni  each  spealcer  in  a 
s  i.ngl  e  EOS  s  ion . 

A  program  tape  v;as  played  to  the  subject  seated  in  the 
anechoic  chamber.  This  tape  contained  an  explanation  of  the 
purpose  of  the  experiment;,  an  explanation  of  the  record j.ng 
procedure,  and.  several  practice  sentences.  The  practice 
sentences  also  served  the  purpose  of  allov.'lng  the  level  con¬ 
trol  on  the  tape  recorder  to  be  adjusted  to  the  subject's 
normal  voice  level.. 


The  60  utterances  the  subject  was  to  say  v:er-e  presented 
by  raeans  of  the  program  tape.  This  v;as  done,  Instead  of 
having  hlra  read  frora  a  list.  In  order  to  insure  uniformity 
of  the  stress  patterns  in  each  sentence  and  to  pace  the  sub¬ 
ject  in  order  to  avoid  the  intonation  pattern  of  contlnucation 
that  subjects  often  use  in  reading  items  from  a  list.  The 


subject  uas  re.minded  that  there  x/as  a  danger  that  he  might 
tend  to  slav.i.shly  imitate  t/ie  exact  intonation  of  the  utter¬ 
ances  on  the  prograra  tape,  and  he  v;as  asked  to  "say  the  sen¬ 


tences  in  the  same  sense  as  that  on  the  program  tape.".  Most 
o.r'  the  ■  sub  jects  v/ero  acquaintances  of  the  experimenter,  and 
it  v/as  felt  tha.t  in  raost  cases  the  utterances  v.'ei'O  spoken 


S 
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natur-ally . 

The  six  sentences  v/ere  presented  to  the  subject  in  nilxed 
order-j  at  interva3.a  of  10  seconclsj  so  as  not  to  tii'-e  the 
subject  andj  again.j  to  avoid  "list  Intonation."  A  manually 
operated  svjitch  in  the  recording  studio  cut  off  tlie  prograra 
material  vjhxle  the  subject  v/as  srjealcinp; .  If  the  subject 
made  a  mlstalce^  the  program  tape  v/as  stopped  and  he  was  asked 
to  repeat  the  sentence  correctly,. 

The  m.astcr  data  tapes  vjere  subsequently  dubbed  onto 
subrnaster  data  tapes,  using  two  Presto  800  tape  declcs.  The 
utteraiices  v’ore  rearranged  in  the  process  so  that  each  sub- 
master  tape  contained  one  sentence.,  v/it/i  only  short  pauses 
between  the  10  repetitionrs  by  each  speaker.  A  copy  of  each 
subrnaster  tape  was  then  i.iade  for  subsequent  analysis.,  and 
the  subniasters  themselves  vjere  preserved  as  backups.  The 
double  copying  increased  the  tape  noise  by  several  dB,  but 
this  v/as  noticeable  only  in  the  high  frequencies  on  the  spec¬ 
trum  analyzer  that  ■’.fas  used,  in  th.1  s  study.  Since  nonstrldent 
fricatives  v;ere  not  golnp.;  to  be  studied,  this  high  frequency 
noise  was  not  important. 


iLiit -  Anai.ysis  hardware  and  software 

The  co'nputer  facility  of  the  Speech  Coramnni Ccition  group 
is  built  around  a  Digital  Equipment  Corporation  PDF-9  general 
purpose  compute!'  vrJ.th  2'rK  of  co3:‘e  raeriioi'y .  Tlic  computer  Is 
coupled  to  pcriph.cral  equipment  especially  designed  to 
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facilitate  on-line  epeccli  rcsear-ch  (rlen'ao^  I968).  This  high¬ 
ly  flexible  arrangement  makOo  it  possible ^  through  convenient 
interconnect iona  and  propior  progranniing ^  to  create  in  effect 
a  special  purpose  on-line  laboratory  instrument. 

A  system  of  programs  (SPADlo)  i;as  v/ritten  to  enable  this 
facility  to  be  used  as  a  general  and  specj.al  purpose  spectml 
analysis  instrument,  A  block  diagram  of  this  conf liguraition 
is  shov/n  in  fig.  2.  This  diagram  depicts  logical  rather  than 
physical  units for  the  bloclcs  enclosed  by  broken  lines  de¬ 
note  functions  uiiich  are  perforraed  by  parts  of  the  computer 
program  rather  than  hardv/are.  The  speech  source  input  v/as 
either  an  Ampex  401A  tijo-channel  tape  deck^  v/hich  j.ncluded. 
provision  for  control  by  the  computer  prograng  or  a  ralcro- 
phono  and.  amplifier  installed  at  the  operating  r-osition. 

The  principal  speech  analysis  tool  in  the  system  is  the 
real-tl.me  spectrum  analyzer.  This  consists  of  a  -1-6  dB/octave 
stage  to  emphasize  the  hig.h  frequencies^  followed  by  36  single 
tuned  bandpass  filters  covering  the  range  150  -  7025  Hz.  Th.e 
filter  specifications  are  given  in  Table  1.  The  center  fre- 
quencies  are  spaced  lineai-ly  up  to  1650  Hz  and  logarithmil.cal- 
ly  thereafter.  The  characteristics  of  adjacent  filters  cross 
at  their  3  oB  points.  Kacii  filter  is  follov;ed  by  an  electro¬ 
nic  '  rectifier  and  lov,-  pass  filter  (tirae  constant  of  10  msec). 

A  36  channel  multiplexer  selects  the  filter  output  to  bo 
sa.mpd-ed.  A  logai-ithm.ic  analog-to-digita],  converter  gives 
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Flcuro  2.  Speech  Communication  computer  facility  conf ' ; -'jratlon  Tor  SPADF5 


the  output 
36  channel 


voltage  dii’-ectly  J.n  decibc.’ls.  A  comp].ete  scan  of 
s  Is  per-iormed  in  1,3  macCj  vdiich  is  small  co;iipared 


to  the  avor-asing  time  of  the  smoothing  filters. 

S:incc  the  first  hanaonic  v;as  present  in  the  recorded 
speech  data^  fundamental  frequency  (P^)  could  be  measured  by 
the  rudimentary  scherac  shov;n  in  Pig.  2.  A  lov;  pass  filter 
v;ith  an  I8  dB/octave  skirt  slope  vias  set  to  a  cutoff  frequen¬ 
cy  of  about  the  highest  value  of  P  expected  for  the  partic- 
ular  speaker  (often  about  160  Hz).  The  output  of  this  fil¬ 


ter  then  consisted  mainly  of  the  first  harmonic.  This  v;as 
sampled  and  converted  to  digital  form  at  a  2  kHz  rate^  and 
a  simple  zero-crossing  detection  algorithm  calculated  the 


estimates  of  P  .  This  method  soine timet 
o 

values  during  unvoiced  segments  (since 
see  if  voicing  v/as  present)  and  sudden 


produced  spurious 


there  v/as  no  check  to 
transitions j  but  for 


most  voiced  segments,  and  for  vov/cls  in  particular.  It  gave 
reliable  and  repeatable  values.  More  effective  and  re¬ 
liable  pitch  extract ii.on  schemes  have  been  described  In  the 
literature  (Gold,  19'52  5  Noll,  I967).  The  1  j.mitations  of  this 


one  should  not  be  interpreted  as  limitations  on  measuring 
fundamental  frequency  in  automatic  speaker  recognition  sys¬ 
tems  . 


A  variation  on  the  manual  analysis-by-synthesis  proce¬ 
dure  described  by  Bell  ct  al .  (1961)  v,'as  implemented  for 
vov.'els  on  the  PDP-9.  A  vov;el  spectrum  v/as  analyzed  by  an 
Iterative  procedure  of  postulate ng  a  set  of  polo  positions. 
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caXciilol-iri^  the  fili.er'  bank  response  to  a  vov.'el  iiavinp;  that 

pole  coni j-i^urationj  coi-nparlnp,  the  calculated  response  to  the 

measured  spjectrum,.  cind  revisinp;  thie  pole  locations  accor-dine- 

ly.  In  this  version^  instead  of  the  calculation  of  the 

fj.lter  bank  response  for  a  given  pole  conf iguratii.onj  a  30 

msec  segment  of  vov/el  is  synthesis^ecl  from  these  parameters 

and  ainalyzecl  by  tlie  filter  bank  itself.  The  synthesis  is 

performed,  by  TAVS,  the  yov/oI  portion  of  a  five  forraant  10  kHz 

sarapled  data  terminal  analog  speech  synthesis  program  written 

by  Prof.  VJ.  Henke  (19^9)-  only  is  this  procedure  faster 

than  the  original  iraplementation^  but  it  is  also  more  accu- 

rate^  since  the  synthesized  spectrum  is  derived  using  the 

measured,  value  of  J’  instead  of  an  assumed  value  of  100  P'fz, 

o  •’ 

and  there  is  no  error  in  calculating  the  filter  bank  response 
The  principal  limitation  on  the  accuracy  of  the  present 
system  is  the  quality  of  the  glottal  source  spectrum  approx- 
imat ion . 

The  results  of  most  of  the  sections  of  the  program 
SPADEp  ape  displayed  on  a  l6'-inch  cathode  ray  tube  display. 

An  arrangement  of  pushbuttons j  knobSj  and  toggle  switches 
provides  a  convenient  interface  for  the  user  to  control  the 
operation  of  the  program. 

The  principal  feectures  of  SPADER  arc:  illustrated  by  the 
typical  CRT  display  shov/n  in  FJ.p;.  3.  1'hc  short-time  spectriua 
andj  f undaap.cntal  frequency  are  measured  every  ].o  msec  during 
an  utterance.  The  data  buffers  have  a  capacity  of  2.9  second 
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A  typical  SPADSp  display.  The  ti-fo  graphs  in  the 
lower  half  represent  functions  of  ti.ne;,  from  0  to 
2,5  sec.  The  upper  one  is  the  sum  of  the  outputs 
of  filters  5-8^  end  the  lower  one  is  Pq.  The 
vertical  cursor  shov.-s  the  point  in  tho^utterance 
at  which  the  short-time  specti^um  displayed  above 
v:as  raeasured.  The  horizontal  amis  of  the  spectrum 
rep^resents  frcciuency;  from  IpO  to  7025  hz^  and  the 
vertical  a::l3  represents  amplitude  in  dB.  (The 
Epectrum  shown  occurs  in  th.o  first  /r:i/  in  I  cannot 
:repKeiT!ber_jLt ,  It  i.s  tlie  175th  frarae  in  tho  Iruifeig 
and  'tlic  val'ue  of  id,  at  that  poi.nt  is  127  Hz.) 
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of  speech  at  this  rate.  Ohic  tv;o  ^:i-£’.plis  in  the  lov/er  part  of 
Mg.  3  represent  functions  of  tirn.ej  frora  0  to  2.5  seconds. 

The  lovicr  graph  is  fundamental  j'requency.  The  upper  oncg 
v/hich  v.rlll  be  called  the  "energy  function/'  is  foraaed  by  sura- 
ming  and  averaging  tiae  outputs  of  cen/tain  filters^  selectable 
by  toggle  sv/itches.  For  nasal  consonant  noasua^eivients  (and 
Fig*  3)  j  filters  5-8  were  used;  in  all  other  cases  j,  filters 
0-5  v;ere  found  to  work  v/ell.  With  these  groups  of  filtei^sg 
the  energy  function  is  a  measure  of  lov;  freciuency  energyg 
and  it  j.s  used  as  a  "syllable  map"  of  the  utterance  for  seg¬ 
mentation  purposes.  The  vertical  cursor  shov/s  the  point  in 
the  utterance  at  v.iiich  the  spectrum  displayed  above  v/as 
measured.  Other  capabilities  of  this  program  include  the 
measurement  of  the  amipl.itudo  and  frequency  of  any  feature  on 
the  di.splayed.  spectrusij  storage  of  a  selected,  spectrum  in 
one  of  l6  special  buffers  for  later  comparison  and  me a sure - 
mentj  the  typing  of  spectrum  data  numerically  or  graphically ^ 
and  the  performance  of  special  measurerrients j  such  as  second 
and  third  central  moments. 

The  speech  data  was  kept  in  analog  form  on  the  tapesj 
since  storage  of  the  spectrai.  dofoa  i.n  digital  form  Vvould  have 
been  far  too  buliry.  This  meant  that  the  data  \;as  not  exactly 

alog  tapej  si.ncc 
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THE  I-iEASURI'MEITTo  j;NvESTIGAT]\D 


ICn  th;i.s  chapter  the  matters  of  the  specific  speech  events 
to  be  measured  and  the  form  of  the  measurernont s  perfonaed  on 
each  vjill  be  discussed,  VJhere  appropriate ^  raeasurements  that 
v/er-e  rejected  v.'lll  also  be  irientioned.  Quantitative  evalua-- 
tionsj  as  discussed  in  Chapter  3/  v;lll  be  presented  for  each 
measui-cnient .  The  collective  effectiveness  of  the  raeasurements 
v/as  evaluated  by  means  of  a  simple  speaker  identification  pro¬ 
cedure,  The  algorithm  and  the  results  obtained  will  bo  dis¬ 
cussed  in  the  final  section  of  the  chapter. 

For  each  rac  a  sure  merit,  each  example,  of  the  appropriate 
sentence  v;as  read  into  the  coraputerj  the  raeasurement  location 
vjithin  the  sentence  v/as  manually  located^  and  the  .racasurenient 
datum  v/as  manually  or  automautically  recordedj  depending  on 
the  specific  measurement  iraplementation.  Since  this  study  of 
the  effectiveness  of  the  acoustic  measurements  v:as  essentially 
exploratory j  it  v/as  felt  that  this  intere/ction  betv/een  the 
experi.menter  and  the  segmentation  and  raeasurc.ment  processes 
v.'as  preferable  to  having  trial  measurements  performed  auto- 
maticallyj  even  at  tlve  cost  of  long  hours  spent  taking  mea¬ 
sure  raent  data. 

Once  trie  mechanics  of  a  trJ.al  measurement  v/ere  developed 


to  t}ie  point  v;here  data  v/ould  be  taken  on  every  speake/u's 
uttcr'cince o ,  t]ie  mefisurement  locations  v/ere  determined  by  simple 


50 

GO  L-hiit  the  .inriuonce  of  the  raanual 


rules  and  procedures^ 
location  procedures  uould  be  rriininized,  Figures  ^1-  and  5 
contain  spoctrogrania  of  one  example  of  each  of  the  six  sen¬ 


tences  contained  in  the  data.  The  locations  marlced  on  t.hem 
v/ill  be  referred  to  in  the  sections  pertaining  to  the  spe¬ 
cific  nieasure'Vients . 

In  the  course  of  the  investigation^  the  various  acous¬ 
tic  nieasureraents  v/ere  given  nmeuionic  names,  v;hich  v/ill  be 


used,  in  this  chapter  to  refer  to 
tistics  for  each  measurement  are 


them.  D?he  pertainent  sta- 
su.emarlzed  in  Appendix  II. 


_5 . 1,  .  ilhidai'.ien tal  f  rcci uenc y 

The  measure.aents  of  fundamental  frec/u.ency  proved  to  be 
the  moat  useful  single  nec/surements  investigated.  v/as 

measured,  at  specific  locations,  rather  than  as  an  average 
over  the  v/lvole  utterance,  for  tv.^o  reasons.  First,  pitch 
measurements  3.t  several  locations  in  the  utterance  v.^ould  pro¬ 
bably  be  very  dependent,  but  in  addition  to  average  pitch, 
they  VJould  contain  information  a.bout  the  pitch  contour,  v/hich 


had  been  used  by  Atal  (1968).  We  wished  to  find  out  if  such 
measurements  v/ould  be  useful  in  the  context  of  a  small  num¬ 
ber  of  efficient  measurements  (or  if  v/e  v./ould  do  better  to 
measure  pitcli  only  once  t'.nd  make  other,  less  dependent  rnea- 
sureme.nts } .  Second,  v/e  v/ished  to  find  out  if  the  increraont 
in  F^  due  to  stress  v.^ould' be  useful. 

Fundo.ii'Ontal  fi-equonoy  v;as  .fli-’st  measured  at  six  D.oca- 
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sentence  3.  "J-'ie  names  and  locatloris  of  these  iiioa- 
are  It  steel  belov;: 

:  clurins  the  nrldcllc  of  I  as  shovjn  by  the  large 


peal:  In  the  cneicgy  function  eorrcspondlng  to 
to  that  syllc'.ble  (point  3a-  in  fth.  ^0* 

3P02 :  dur-j.ng  the  middle  of  the  first  vov/el  in 

cannot .  The  energy  function  drops  sudderr- 
ly  at  the  onset  of  /n/_,  so  the  vovjcl  is  de¬ 
lineated  betv/een  the  burst  of  /k/  and  the 


3P05 ; 


beginning  of  /n/  (point  3b  in  Fig.  ^l). 
at  the  peak  of  F^  during  the  syllable  not 
(point  3d  in  Fig.  .  Cannot  uas  stressed 
on  the  second  syllable. 

during  the  fli’st  vov/el  in  re  me  mb  or  ^  at  the 
peak  of  the  energy  function  for  that  syllable 
(point  3e  in  Fig.  ^). 

during  the  middle  of  the  second  vov/cl  in 
remember  (point  3s  in  Fig.  ) .  The  nasals  on 
either  side  clearly  delineate  the  vov/el  In 


the  energy  function. 

3F0G:  at  the  peak  of  F^  corresponding  to  the  stress 

on  the  second  syllable  of  r  erne  mb  or  (point  3n 
in  Fig,  ^!- ) .  In  those  cases  v/here  there  v;as 
a  peak  in  F^  due  to  the  stress^  it  usually 
occurred  durii.ng  tlie  second,  /m/.  If  there  v/as 
no  riso_,  3F0G  v/as  given  the  sauie  value  as  j-FOy, 


'1 


The  iiicroiacnl-fj  In  at  the  Ktr-cssec'  syllables  in  c annot 
and  rej'ae_:iiotn£  v/cre  obtained  by  sub tr-t,ic tiny  3i'’C^3  irorn  31'"02  and 
3F06  from  3P0^1 .  Inspection  of  tliese  data  shov’ed  laryc  varia¬ 
tions  Viithin  most  5ndividucila '  utterances.  Since  in  many 
cases  these  variations  v.'cre  about  the  scuas  as  the  total  ranye, 
the  possibility  of  usiny  these  incrcLients  as  characterizing; 
measurements  v;as  abandoned  v/lthout  further  anaJ.ysis,  Mea- 
sureraent  3fb5  v/as  also  discarded^,  since  it  vjas  often  no  dif¬ 
ferent  from  3PO5. 

The  F-ratios  for  the  five  reiaaininy  nieasureraents  are 
given  belov/. 


Measurement 


3F02 

311)3 

3FQ)^ 

3F05 


P-rat io 

61.8 

71.2 

30.9 

51.8 

52.8 


Measureraent  31^o)3  is  rated  appreciab^.y  poorer  than  the  otJier 
four,  A  second  loolc  at  some  examples  of  sentence  easily 
shov/ed  v/hp .  The  stress  in  the  v/ord.  cannot  v/as  on  the  second 
syllable.  As  a  re  suit  F^  began  to  rise  during  the  /n/  and 
often  did  not  reach  a.  peak  before  the  vocal  tract  closure  of 
the  /t/  cut  off  the  voicing.  As  a  result  of  this  sudclen 
transltion_,  the  P^  neasiucements  often  contained  several  spu¬ 
rious  (very  high  or  very  lov.-)  values  at  this  point.  In 
these  casesj  the  datum  recorded  for  3F03  the  last  value 
v;hiCi!  connected  ccjntinuous].y  v;ith  the  previous  values.  Thus 
there  v'as  a  greater  ti;se  uncertainty 


In  th.o  location  of  the 
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me  a  surer,  lent  due  to  the  sudden  articulatory  movement  at  that 
point  and  the  inabil.ity  of  the  raeasureraent  technique  to 
handle  this  case  viell. 

The  intra-spcciker  variability  of  this  pitch  raeasure- 
rnent  is  pr-obably  also  increased  by  an  articulatory  phenornG" 
non.  There  is  less  need  for  precise  control  of  the  rise  of 
pitch  due  to  stress  in  this  content,,  since  the  /t/  which 
ends  the  stressed  syllable  effectively  controls  it  by  term- 
inating  the  voicing.  O.lsis  context  may  be  contrasted  to  the 
second  syllable  in  remember ,  where  there  is  no  interruption 
of  the  airflo'u  through  the  larynx^  so'  tlie  contour  must 
be  explicitly  controlled.  The  moral  of  this  story  is  not 
to  place  F^  measurements  in  locations  coincident  v;ith  sud¬ 
den  transit.i.ons  from  a  sonorant  to  a  nonsonorant. 

Fundamental  frequency  was  cilso  measured  in  five  other 
locations  in  sentences  5  und  6. 

5F01:  in  th.e  middle  of  fev.q  as  shov/n  by  the  energy 

function  (point  pn  in  Pig.  5).  (F  =  &],  .0) 
hF 


at  the  peal:  of  in  fcij,  usually  very  close 
to5ii’01  (point  5b  in  Fig.  5).  (P  -  8^1.9) 

5F03:  in  tlie  middle  of  the  diphtho.ng  in  boys 

(poi.nt  5c  in  Fig.  5).  (f  -  5-’^3) 

5FO'' :  in  the  uxlddle  of  bought  (point  5g  in  Fj.g.  5). 

(1'^  -  69.5)  - 

A-dFO;  in  the  rmi.ddle  of  tlie  vo'.,'el  in  caoh  in  sentence 
6  (jwint  6a  Pig.  5).  (p  79.9) 
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The  valMCB  of  F-ratic  foi'  these  flvo;  measureaionto  con- 
farm  that  the  variability  in  3i^03  v;as  a  special  case  and  nol 
a  conscquoncG  of  the  syllable  being  stressed.  Aside  from 
3F03j  every  measurement  had  a  hlgh.er  value  of  F-ratio 
than  the  nc:;t  best  measurement.  V/ith  the  small  nuiiijcr  of 
examples  at  hand^  there  seems  to  be  no  particular  advantage 
to  stressed  or  unstressed  syllables.  It  should  be  noted 
that  the  use  of  an  F  measurement  requires  the  assuiaption 
th>at  the  spealcer  is  in  some  kind  of  normal;,  cooperative 
state.  Fundamental  frequency  is  very  susceptible  to  stress 
on  the  speaker  (Heckeip  et  al . ^  I968)  and  It  is  perhaps  the 
easiest  and  most  obvious  acoustic  correlate  to  modify  for 
the  purpose  of  voice  disguise. 


5.2 _ Nasal  consonants 

The  articulatory  conf iguz-at ion  of  the  nasal  consonants 
makes  them  particularly  appropriate  for  speaker  recognition 
measureraents .  They  are  formed  b;’'  closing  the  mouth  cavity 
at  some  point  and  opening  the  veluDj  permitting  air  flon 
through  the  nasal  cavity.  Hence  a  portion  of  the  acoustic 
system  for  nasal  consonants  is  fixed  and  is  not  subject  to 
articulatory  .uiovement  and  variation.  Glenn  and  Kleiner 
(1968)  state  that  the  other  articulators  do  not  move  during 
the  period  of  oral  ci.orjurej  in  contrast  to  their  virtually 
constant  motion  during  other  phones  of  normal  speech.  This 
statement  is  perlio-ps  an  approximation ^  but  spec trogra:'iS  of 


S 
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naoal  conoonants  aico  cii.aractei=:j,KGd  by-  ;i.ar:^oly  hori?;oiital  I'ci-iii 
ants  clupirjy  ti^e  nasal  niuriaur .  f urthcrinopo ^  nasal  consonants 
are  not  rare  events j  but  co;aprise  lljj  of  the  plionosd.c  content 
of  cosiivionly  spoken  Enyllsn  (Tobias.  1939). 

Since  the  mouth  cavity  acts  as  a  shunt^  it  introduces 
zei'-o3  into  the  stjectrura  of  nasal  consonants.  The  spectrun 
of  a  clear  /ra/  is  .shorn  in  Pig.  3  -tn  Chapter  7.  The  region 
betv;een  the  first  formant  and  the  second  spectral  peak  con¬ 
tains  a  polCj  but  the  lou’est  zero  clue  to  the  mouth  cavity 
has  Gifectiveiy  cancelled  its  effect  in  the  spectrum.  The 
third  and  fourth  spectral  peaks  occur  around  2  and  3  kHz. 

The  spectrum  of  a  clear  /n/  is  similar,  but  the  shorter  inoutii 
cavity  means  'that  the  lo'v;st  zero  occurs  higher  in  frequency'', 
often  effectively  canceling  the  pole  in  the  neighborliood  of 
1.3  kHz,  and  leaving  the  pole  Just  belov;  1  kHz  in  the  clear 
(Pi.gimura,  1962 ) . 

The  interplay  between  the  u'iouth  and  nasal  cavities  can 
produce  considerable  variability  in  the  700-I600  Hz  portion  ■ 
of  the  spectru;;!,  depending  on  the  nature  of  tiiose  cavities, 
and  hence  on  tiio  individual  speaker.  The  analysis  and  ex- 
pej/imento  of  Pant  (l9b0)  and  Pujimura  (1962)  suggest  that 
certain  poies  of  the  transfer  functions  of  the  iiasa.!  conson¬ 
ants  are  closely  tied  to  the  nasal  cavity  alone. 

The  first  formant  is  very  lov;,  and  it  is  ascribed  to  a 
lumped-circu it  resonance  between  t];c  pharyngeal  and  the 

nasal  cavities,  Pujiisur-a  found  it  to  be  quite  stable.  The 
Gscond  formant,  usualj.y  not  vi3.ib].e  in  /^^/.  out  c;ftcn  vinible 
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in  /n/j  seems  to  be  approximately  a  quarter  v/avelongtii  re¬ 
sonance  of  the  nasal  cxivit;,  alone  v;hen  It  occurs  around  1  kHz 
in  /n/.  Hujiraux’a  ascribes  the  formant  occurring;  around  2  klfe 
in  /m/  to  the  nasal  cavity  by  virtu;e  of  its  stability  and 
large  bandnldth.  He  also  coroents  on  the  stability  of  the 
forraant  around  3  idi"'  "Hi  /n/^  v:hiU;h  Hant  assigns  to  the  three- 
quarter  v.'avel.ength  resonance  of  the  nasal  cavity. 

These  arguaents  sugix'st  that  tlie  locations  of  these 
spectral  pealcs  v.’ould  make  good  speaker  recognition  measure - 
nients/  since  they  a,re  closely  tied  to  a  specific  anatomical 
feature.  In  actual  fact;  the  nasal  spectra;  at  least  as 
shov;n  by  the  36 -channel  filter  bank;  do  shov;  variations  among 
speakers,,  but  these  spectral  peaks  are  often  ■J.mpoasible  to 
identifv^  thus  failing  the  measurability  criterion. 

Sorae  examples  of  this  variation  of  the  visible  features 
are  shoi;n  in  the  computer  display  photographs  in  Fig.  6,  jFa.cIi 
rov/  contains  four  examples  of  /ra/  by  one  speaker-;  and  dif¬ 
ferent  spealcers  are  represented  by  different  rov-s.  The  top 

rov;  is  by  the  same  speaker  as  in  Figc  3.  The  second  row 

* 

shov/G  a  spGo^:.er•  whose  Fg  is  not  completely  cancelled  by  the 
xer-O;  resulting  in  a  small  peak  around  8OO  Hz.  The  tliird  row 
shov's  a  speak:er  \;hoEO  Fg  and  are  both  considerably  affect¬ 
ed  by  trie  zero;  resulting  in  a  lack  of  pealcs  in  that  region 
of  the  spectrum.  The  formant  damping  is  generally  higher 
in  the  nasals  than  In  the  vowel S;  and  that  may  also  have 
contributed  to  the  v.'ealmiess  of  the  peaks.  In  tJrc  fourtli  roW; 
the  peak  around  3  kHz  is  absent  from  the  spectrum.  This  is 
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probably  uuc  In  thin  cclro  to  the  proximity  oT  that  pol-O  v/ith 
the  second  aero  of  tijc  mouth  cavit^'.  The  bottom  tv;o  rous 
shov/  /;a/  spectra  in  \iiich  the  consistent  identification  of 
those  pealcs  is  dlfficul.t,,  if  not  impossible.  The  spectra  of 
/n/  shov;  similar  effects. 

The  additional  variability  introduced  by  the  zeros  and 
the  red'o.ction  in  the  definition  of  spectral  p  ealis  caused  by 
the  hifieer  damping  of  the  nasal  consonants  make  formant  mea¬ 
surement  a  difficult  if  not  impossible  technique.  Prof,  D. 
Klaitt  suggested  that  the  individual  filter  outputs  in  the 
neighborhood  of  these  foraiants  be  examined  to  see  if  they 
are  generally  sensitive  to  ch.anges  in  formant  location. 

Since  such  data  are  subject  to  variation  due  to  differences 
in  voice  levels  they  must  be  suitably  normalized, 

A  subprogram  uas  v/ri.tten  for  SPADfp  vdiich  perfoimed  the 
accurau.lcit ion  of  data  from  selected  filter  outputs^  subject 
to  an  intensity-normalization  term  for  each  utterance.  This 
subprogram  v;as  used  to  maice  measuremento  in  the  middle  of 
the  /n/  ,and  first  /m/  in  sentence  3  (points  3c  and  3f  in 
Pig,  t).  They  v;ere  normalized  by  subtracting  the  value  of 
the  energy  function  i.n  the  follov.’lng  vov;el.  Filters  viere 
used  for  the  energy  functi.on  in  this  case^  since  it  v/as  found 
that  this  frequency  region  emphasized  the  lovjcr  intensity 
of  the  nasals.  The  nasal  consonants  then  shov/ed  up  on  the 
energy  funcmcj.on  as  suort  rvvgiona  of  noti.ccably  lower-  values 
(refer  bac!:  to  Pig.  3),  Imcse  nasal  measurements  iiere  given 
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the  neriGo  3?h  oi-  v/hcre  j.  Is  the  filter  number. 


In  this  manner,  many  bifforent  measurementG  \;ere  aocumu- 
latcxl  for  these  examples  of  /in/  and  /n/.  figure  7  siiovis  graphs 
of  the  f-ratio  for  each  measure'iient  versus  filter  number 
(hence  versus  ■  frequency ) .  The  laeasurcments  talien  from  the 
filto};-3  that  roughly  corrspspond  to  the  frequencies  of  the 
spectrum  feature  a  described  cibove  make  broad  poa!:s  in  the  F- 
ratio  curves.  The  fact  that  these  maxima  are  not  jirst  due 
to  single  points  having  high  values  supports  the  contentj.on 
that  they  repr-e.sent  these  features.  These  rnaxlraa  correspond 
to  the  region  of  pole -zero  interplay  boloir  1  kKz  and  to  the 
formants  around  O.apj  2^.  and  3  ifi  /“'/  (filters  1.  6j  17j, 
and  23  )j  and  to  the  formants  around  2,  and  3  in  /n/ 

(filters  Sj  l8.  and  25).  This  technique  falls  short  of  the 
criterio.n  of  tying  a  .measuromant  directly  to  a  structural 
feature.  but  in  the  absence  of  the  ability  to  characterize 
speakeiCpS  by^  the  location  of  certain  spectral  features,,  it 
does  take  cognizance  03!*  the  locations  of  the  poles  £ind  zeros 
und  er  1  y  hig  the  spe c  tra . 

Although  the  nasal  consonants  are  less  subject  to  move- 
r.)ont  o2  the  cirticulators  than  other  sounds ^  they  may^  be  par¬ 
ticularly  sensitive  to  the  state  of  health  of  the  speaker. 
Certainly  a  bad  cold  can  block  the  nasal  passage  complete]  yg 
v’ith  the  result  that  the  nasals  are  transformed  into  the 
corresponding  voiced  stops.  Ho  v;ork  seems  to  Iiavc  been  done 
on  the  effects  of  rosnjratory  inf Isi'niations  on  the 
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cor3:-clatet;  of  fspcech^  so 
titae  the  condftione  undox' 
T-itll  deviate  significant! 


it  is  not  poaslb3.o  to 
v;hicJ'i  these  oi'  other 
y  f  i=om  the  normal . 


state  cit  this 
lije  y.su3;'oracnt  s 


The  length  of  the  vocal  ti'-act  and  the  sizes  of  its  va¬ 
rious  parts  deterraine  the  frequency  ranges  of  th_e  foririants. 
Speakers  differ  soraev.iiat  in  these  ranges^  yet  listeners  can 
easily  perceive  the  sanio  vovjel  in  spite  of  large  differences 
in  vocal  tract  size  (Peterson  and.  Barney 19S^2).  Range  of 
formants  has  been  found  to  be  a  corccelate  of  voice  cpaality 
(Ladefoged  and  Broadbent^  1957 j  Shearine  and  Kolraes^  1959 j 
Hiller^  196'i- ) .  It  has  also  been  shorn  tiiat  the  identification 
of  a  vov.'el  can  be  strongly  Influenced  by  changing  the  forniant 
ranges  of  surrounding  vo\:els  (Ladefoged  and  Broadbent^  1957). 
This  last  finding  suggests  that  the  listener  effectively  ap- 
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since  v/c  can  inniediately  undei'Stand  the  speech  of  a  stranger. 
Since  this  nor;;3allzation  is  speaker-specific j  its  acoustic 
correlates  v;ould  bo  good  measurements  for  speaker  recognition. 

Hemda].  (1967)  used  the  formant  frequencies  of  the  schwa 
vov/el  (/a/)  as  rofej.'cnce  data  for  formant  vari.abi.j.i.ty  co'cpen- 
sation  in  a  speccl'i  recognition  eapcriiaent ^  with  raoderato  suc- 
c o s c  A.’ccs  t'ilo vg',].. »-■  Ci'lii  i-^-ei.  s  nc.‘U-i.*j.*sB.  Viiis  u^voscec'.  vov/ei.  q 


reflect  the  vocal  tract  length  of  each  individual. 

Gerstiiian  (i960)  has  shov?.n  that  scalinc;  and  linear¬ 
ly  bet'v/ecn  the  eatrer.ie  values  of  F^  and  F^  for  each  spealcer 
is  an  effective  procedure  for  reducJeig  vov;cl  formant  varia- 
biD.ity  across  many  spcalcers.  This  procedure  suggests  that 
the  extroniCo  of  vov/el  articulation  {/l/,  /a/j  and  /u/)  pi’ovicl.e 
sorae  sort  of  reference  points  for  the  fornant  ranges  of  the 
individual.  It  can  perhaps  be  argued  that  these , articulations 
are  raore  stable  than  others  since  they  require  control  only  to 
the  extent  of  moving  the  articuj.ators  to  an  extrerae  position^ 
a.s  opposed  to  an  intermediate  jjosltlon.  In  fact^  it  has  been 
found  tlaat  the  first  tvjo  forraants  in  /i/^  /a/,  and  /u/  are 
the  least  sensitive  to  the  effect  of  context  (Stevens  and 
Housej  1963).  This  argument  is  supported  by  Stevens’  theory 
of  the  quantal  na.ture  of  certain  vov/el  articulations  (Stevens,, 
in  press). 

Pour  vov/els  v/ere  examined  for  their  use  In  speaker  re¬ 
cognition  rr.easurementa .  They  v/ere  the  schv:a  in  sentence  2 
(poi2it  2a  in  Fig.  ^t)j  the  /a/  and  /!/  in  sentence  (points 
4a  and  c-b  in  Fig.  p),*  and  the  /de/  i.n  sentence  6  (poivit  6a  in 
Pig.  3). 

Tiic  fo'rmants  of  the  schv/a.  voxel  are  ideally  spaced  a.t 
intervals  of  approximately  1  kHSj  and  iicncc  they  shov/  up  as 
distinct  pealrs  v/ith  the  filter  ba.n'-:  sncctrura  analg/^^er .  The 
foriuant  frequencies  v/ore  measured  using  a  peak  interpolation 
a].gori.t‘'im  in  dPAl)F-jj  viiich  roughiy  intcrpolatec  t/io  frcciuency 
of  a  spectral  peak:  from  tlio  local  raaxinium  and  the  data  on 
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either  sieJe,  The  peak  correapondin-;  to  1*'^  v;as  gonerally  v'eak 
and  occasionally  absent^  and  P^j  nas  aii.so  frcquenLly  unclear^ 
so  the  first  tv/o  forr.iants  v.'ore  tiic  only  riiGcisureiaents  analyzed 
further.  Por  moasureiaents  UHPi  and  bHP^^  the  values  of  F- 
ratio  ’aero  21.1  and  ^h'!  .  6. 

Measurins  the  frequencies  of  Pg^  P^^  or  P^^  in  the  vo’rel 
/l/  is  often  not  possible  v;ith  the  filter  bank  spectrum  ana¬ 
lyzer.  All  throe  formants  combine  to  form  a  broad  concentra¬ 
tion  of  energy  in  the  2-4  kHz  region^  and.  the  anal.yzing  fil¬ 
ters  are  not  naiTow  enous.h  to  permit  the  resolution  of  the 
individual  peaks  in  all  cases.  This  condition  is  illustrat¬ 
ed  by  the  examples  of  /i/  from  four  speakers  shovin  In  Pig.  8. 
Tlic  shape  of  this  broad  high  frequency  peakj  ’.rhich  is  deter¬ 
mined  by  the  frequencies  and  bandv-idths  of  P^j  P^j  a.rid  Ih,., 
seeras  to  be  characteristic  of  the  speaker. 

A  subroutine  v/as  v/rltten  to  evaluate  the  second  and  third 
central  moraents  and  the  skev.-ness  of  any  frequency  ra.nge  of  a 
selected  spectrum.  These  mea surenient g  pertain  to  the  shape 
of  the  spectrum  as  dlnplayedj  i.e.^  horizontal  coordinate  re¬ 
presents  filter  number^  and  vertical  coordinate  represents 
amplitude  in  dB.  The  skevrness  is  defined  as 

NX 
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v/hcre  end  are  the  second  and  third,  central  moments ,  It 
a  o 

vfas  found  thakc  since  the  variance  of  X/^  v/as  much  greater  thian 


that  of^.  the  skciaiCGs  laeasuroiicnt  behaved  similarly  toXA^ 
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and  :lt  v;as  subsequently  discarded.  It  v;as  rilso  realized  that 
those  nouients  depended  on  tiie  overall  height  of  the  curve ^ 
and  that  tiie  portion  of  the  curve  bolov;  the  rainiraiun  value 
made  a  large  contribution  to  these  moments ^  vihich  remained 
constant  even  if  the  shape  changed.  The  algorithm  v;as  modi¬ 
fied  to  cad.culate  the  moment Sj  setting  the  zero  coordinate 
to  the  raininLum  value  in  the  range  j  so  as  to  produce  a  greater- 
variation  in  the  raomentSj  due  to  changes  of  shape. 

The  freciuency  range  1.55-^r.3b  kHz  v;as  eraplriccfiJ.y  se¬ 
lected  for  the  vov.'el  /i/^  since  that  included,  the  major  por¬ 
tion  of  the  concentration.  Per  measurement  I,S2. 

a  O 

the  second  central  .moment;,  and  IU3j  the  third  central  nio.ment^ 
P-rntios  of  32.7  and  3’'. 7  v:ero  obtained. 

Measureiricnt  of  p.^  and  Pg  in  the  vo’ucl  /a/  is  also  dif¬ 
ficult  for  many  speakers.  These  formants  are  close  togetlier-j 
producing  a  broad  peak  in  the  range  5OO-I500  Hz^  as  illustrat¬ 
ed  by  the  examples  of  /a/  f3r-ora  four  speakers  shown  in  Pig.  9. 
Tiie  second  and  thi.r-d  central  mo.ments  for  /a/  were  measu.r-ed 
over  the  range  Hz^  but  these  .measurc.ments  v.'ere  not 

as  successful  as  the  onos  for  /i/.  For  AS2 ,  'the  second  centra.l 
moraent;,  and  AU3;  the  third  contra!  moment^  P-ratios  of  11.8 
and  10.2  v^ero  obtained. 

The  analysis -by -synthesis  program  desci-lbed  in  Chapter  4 
was  developed  to  permit  formant  measurements  in  the  cases 
v.’hcrc  tne  spoctj:-al  peaks  arc  not  distinct.  The  cunalysis  pro¬ 
cedure  consists  of  selects ng  a  spectrum  to  be  anaj.yzcd  and 
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87  S7  111  109 


112  118  113  lie 


104  123  103  96 


Spectra  of  /a/.  Each  rovj  contains 
a  single  spoalcor. 


c.xar.'.pleG  by 
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then  luanvially  settin:;  the  valuoa  of  frequency  and  bandv/idth 
of  the  first  four  for.v.ants  by  iiiecans  of  pus’nbu.ttons  and  a 
knob  adjusti.ient  so  tliat  a.  cood  raatch  is  obtained  betv.-cen  the 
real  and  synthesized  .spectra  in  the  ra.n£c  up  to  3.2  kHz. 

The  fifth  formant  v;as  kept  constant  at  4.5  for  it  h^xl 

Ij.ttle  effect  in  the  cosiporison  range.  The  value  of  mea¬ 
sured  at  the  time  of  the  selected  spectrum  v;as  used  in  the 
syaithesis. 

This  teciinique  v;as  first  applied  to  the  analysis  of  /a?/ 
This  YOv.^el  v/as  felt  to  be  an  easy  onOj  since  like  /d/j  the 
first  four  formants  are  generally  distinct.  Uith  the  experi 
ence  gained  on  /se/j  the  vov;el  /a/  v/as  also  analyzed.  In  thj. 
CcisGj  the  P^  and.  P^  peaks  v/ere  usually  not  distinct,,  and  the 
task  v/as  soiaev/hat  harder.  ik)r  most  examples,,  a  successful 
match  could  be  obtained  in  a  little  over  a  minute j  and  tine 
yob  vjas  made  easier  by  the  faxt  that  the  10  examples  by  each 
speaker  v;ere  similar.  The  taslc  becaiiie  tedious  for  a.  large 
number  of  analyses.  This  analj'-sis  technique  is  amenable  to 
autoiaation  (Paul,,  ct  al .  ^  196'l-). 

The  match  betv;ecn  the  tv;o  spectra ^  as  expressed  by  the 
squared  error  (Bell,  et  al . ,  I961)  is  much  less  sensitive 
to  the  forr.uunt  banduuldths  than  to  the  j'l’ormant  frequencies. 

In  addition,  the  bandv/idths  sometiises  hod  to  be  set  to  extra 
values  in  order  to  have  the  formant  peaks  at  the  right  ampli 
tildes,  particularly  in  the  case  of  This  condition  is 

attributed  to  inaccu;cacy  of  the  glottal  source  spoctrum 
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appr-oxiniation .  Conacquontly ^  the  bandv/ldtris  v.’ere  not  felt  to 
be  as  a'ccu.r-ate;  as  the  ior-niant  frequencies.  The  .Inaccuracy  clue 
to  the  glottal  spectrum  effect  does  not  necessarily  invalidate 
the  use  of  the  bandv.aldthSj  but  it  does  mean  that  their  intra- 
speakcr  vari.ability  comes  from  tvjo  separate  sources.  For  that 
reason  and  because  of  the  first  effect  stated  above,  they  v/ere 
not  tried  as  speaker-characterizing  measureraents .  The  third 
forraant  peak  in  both  /as/  and  /a/  is  often  indistinct,  so  only 


the  first  tv.'o  formant  frequencies  v/ere  used.  The  values  of 
F-ratio  for  the  first  tv;o  forraants  of  /9/  (repeated),  /se/. 


and  /a/  are  given  belov/. 


UHFl ; 

.  21.1 

ITHP2  : 

^/i.6 

(from 

spectrum  peaks) 

AEFl : 

15.5 

AEF2 : 

^16.6 

(analy 

sis-by-synthesi 

A  FI : 

22.9 

AP2  : 

19.0 

(analy 

s  i.  s  -  b  y  -  s  y  n  t  h  e  s  i. 

In  the  voxels  /d/  and  /a?/,  the  Fp  rae  a.  sure  went  is  by  far 
the  bettor  one.  Inspection  of  the  measurement  statistics  i.n 
Appendix  11  shov/s  that  the  total  distributions  of  the  Fp  mea¬ 
surements  are  about  tv/ice  as  v/lde  as  the  F^  measurements. 

The  higher  varicibi.lity  of  Fp  can  be  lnterp:ceted  as  greater- 
opportunity  for  variation  among  speakers.  In  the  case  of  the 
vov;el  /a/,  the  distribution  is  the  v;lder,  but  neither  F^ 
nor  Fp  is  as  v/ide  as  in  /9/  and  /as/. 


3.-^.^  Source  spec trura  si.ope 

Tiie  laryngeal  excitation 
dividual,  lai'ynges,  as  \/e  liave 


o  xh  1  b  i  t  s  c  h  r  a  c  t  e  r  i. 
seen  in  the  Ccase  of 


tics  of  i.n- 
fund amenta 1 
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(" 


frequency.  Tiie  structure  of  the  larynx  affects  not  only/  the 
pulse  repetition  rate,,  but  also  the  pulse  shape v/hlch  Is 
reflected  in  the  envelope  of  the  laryngeal  source  spoctrura. 
Unfortunately,  this  spectrum  is  not  directly  accessible  to 
measurement  in  the  speech  signal,  since  of  course  it  is  mO" 
difled  by  the  transfer  function  of  the  upper  vocal  tract. 
Martony  (iSo^j) ,  using  inverse  filtering,  has  found  signlf:],- 
cant  differences  among  speakers  in  the  high  frequency  slope 
of  the  source  spectrum.  The  inverse  filtering  technique  is 
complex  and  probably  not  amenable  to  automatic  processing. 

Using  a  suggestion  by  Prof,  K.  Stevens,  a  measurement 
which  crudely  apqjroximates  source  spectrura  slope  from  a  vov;el 
spectrum  v;as  implemented  v'ith  moderate  success.  The  higher- 


formant  peaks  in  a  vov/el  spectrum,  fall  off  in  amplitude,  due 
to  the  source  spectrum  slope  of  about  -12  to  -l8  dB/octave 
and  to  the  increased  damping  of  the  higher  formants  (and  the 
-1-6  d3/octave  radiation  cha.j.=acteristlc  ) .  Amplj.tude  measure¬ 
ments  at  a  lovj  frequency  formant  peak  and.  at  a  high  frequency 
formant  peak  v/ould  app^i-oxirnate  the  extent  of  this  drop,  if 
there  v/cre  little  variation  In  the  sharpness  of  the  peaks,  if 
formants  wore  not  so  close  as  to  cnhanco  each  other's  ampli¬ 
tudes,  and  if  there  vierc  little  vax-iation  in  the  positions  of 
the  other  fornuintG.  Variations  :i.n  the  frequency  separation 
of  the  peaks  could  be  r-oug.hly  compensated,  for-  by  dividing  the 
amplitude  difference  (lie  d.B)  by  the  frequency  di.rfcrencc  on  a 
logaritliniic  sca].c . 
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These  conclltlons  arc  appr-oxlraately  satisfied  in  the  vo’.vel 

.  o 

/u/.  Fj  is  lovjj  and  is  gerier-ally  at  least  an  octave  higher 
P^  and  Pij  are  gcnei-ally  separate  ^  and  at  least  one  of  then  is 
usually  visible.  That  is^  iVf  P^  is  very  v'eak,  Pjj^  can  serve 
as  a  measureraent  point,  0?he  actual  algorithm  for  the  measure¬ 
ment  is  the  difference  in  araplitudes  (in  dB)  betv/een  the 
maximum  belov'  530  Hr,  (:i. .e.^  P^ )  and  the  maximum  above  2  kHz 
(i.e,^  P,^  or  Fjj^ ) ;,  divided  by  their  frequency  difference  on  a 
logarithmic  scale  (i.e.;,  log  P^  -  log  ^  p )  •  This  algorithm 
may  not  be  intuitively  pleasing  as  an  approximation  to  the 
phenoraenon  it  purports  to  measure,,  but  it  has  been  used  v;lth 
some  success.  It  is  just  not  certain  that  this  success  is  not 
partl.y  due  to  the  coiribination  of  other  factors  v/hlch  affect  it 
Tills  racasureraent j  named  UiD.A,,  v/as  impleiaented  for  the  /u/  in 
sentence  1  (point  la  in  Pig.  7)^,  taken  at  a  point  one-third  of 
the  v;ay  through  the  first  syllable ^  to  simplify  the  problera 
of  segmenting  the  /u/  frora  the  ///. 

The  P-ratio  for  this  measurement  vias  36.3.  Sevez'al  al¬ 
ternate  msasurements  of  this  type  were  also  tried ^  v;lth  very 
little  success.  One  of  these  v;as  similar  to  UHlAj  except 
that  the  second  measurement  point  v;as  the  relative  minimum  be¬ 
tween  P^  and  P^.  Others  oraittod  the  dj, vis  ion  by  the  frequency 
separation  term . 


5^5 _ The  fricative  /h/ 

The  sirectrum  of  the  frj.cativc  /a/  dependG  mainly  on  the 
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ana L.oiuical  dcla;'.! 
alveolar  rid.^^c. 
ed  by  the  entire 
It.  it  v;as  found 


s  of 


Hence 


the  reyion  around  and  forv/ard  of  the 
raeasu.reincnte  on  /a/  are  not  influcnc- 


vocal  tract j  but  o.nl3^  bi'^  a  small  portion  of 
that  the  l.ocations  of  the  hisli  frequency 


spectral  peaks  are  not  partlcularli/  stable ^  but  tliat  the 
shape  of  the  high  frcquenci^  region  seems  to  be  characteristic 
of  the  speaker. 

It  is  possible  to  classj.fy  examples  of  /s/  in  terras  of 
gross  shape.  Figure  10  shov/s  examples  of  /s/  by  four  speakers 
These  examples  illustrate  the  four  shapes  that  have  been  so 
defj.ned.  They  are  ^  frora  the  topj  single  narrovj  poak^  wide 


or  double  pealc^  flat  region^  and  very  low  raajor  peci.k.  (The 
asyraptotlcally  flat  lov;  freciuency  region  shov/n  in  the  displaj^ 
is  due  to  the  fact  that  the  high  frequency  skirts  of  those 
filters  are  coincident.  The  amplitude  of  those  filter  out¬ 
puts  is  an  artifact  of  the  filter  characteristic s ^  rather  than 
an  indication  of  energy  at  those  points.) 

The  shape  classification  algorithm  is  described  oy  the 
follova'.ng  ordered  set  of  rules. 

1.  If  there  is  a  major  peek  (i.e.;,  a  dip  of  at  least 
3  dB  on  the  high  side)  lovjer  than  the  2.33  kHz  filter, 
call  it  low  majo_r  peak . 

2.  If  the  maxiraura  drop  in  araplj.tude  above  the  high- 
.est  filter  output  is  6  dB  or  less,  and 

2a.  the  highest  fi],ter  output  occurs 
than  5  I'Hz,  or 


I, 


lower 


e 


Figure  10. 


Spectra  of  /o/^ 
rovj  contain;’;  4 


illustrating 
oxaraplcs  by  a 


the  shapes.  Each 
single  spealcer . 


C 
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2t) .  the  r:!a:-:lhiu!:ii  drop  tn  amplitude  frora  the 
hlj^iheet  filter  output  to  the  3925  filter  is 
less  than  or  equal  to  6  dB^ 
then  call  it  flat  region. 

3.  Examine  the  spectrum  on  either  side  of  tiic  high¬ 
est  filter  output.  If  the  6  dB  dov:n  points  are  less 
than  2  kHx  apart ^  call  it  single  ne.rrovj  peak.  Other- 
VJise  call  it  v/idc  or  double  pealc.  This  step  requires 
interjx''lation  betv:een  data  points j  since  the  spectrura 
may  fall  steeply,  (if  the  highest  filter  output  is  so 
high  ih  frequency  that  tliere  is  no  upper  6clB  dov/n 
pointy  call  it  single  narrov/  peak.  This  v'ould  be  rare 
for  /^/. ) 

This  shape  classification  vias  performed  on  the  example 
of  /s/  in  sentence  6  (point  6b  in  Fig.  5)«  T'he  /s/  in  shirts 
in  sentence  1  was  not  used^  since  the  lip  rounding  due  to 
the  coarticulation  of  /'^ /  strongly  modifies  the  spectrum  so 
that  this  set  of  prototype  shapes  doss  not  hold.  Many  of 
the  speakers  had  exaraples  of  /£  /  falling  in  tv;o  shape-classes ^ 
and  a  fev;  had  a  single  exanrole  falling  in  a  third  class. 

This  measurement  is  a  di.screte^  qualitative  me  a  sur  omenta 
as  op^posed  to  the  continuous,,  quantitative  measurements  dis¬ 
cussed  previously.  The  natural  way  to  characterize  a  speaker 
is  by  probabill ty  esti. mates,  of  each  sh.cipe-class j  but  this  i_s 
not  dircctj.y  eor.ipatible  vji.tli  quant itati.ve  measuremxints  in 
terius  of  evaluxitlve  measures  and  classi.f j.cati.on  nrococluros . 
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For-  i..ho(je  purposes;,  the  nuiaericcu.  values  2^  3j  find  7  v?cre 
arbiti-arllp  as3yp;ned  to  the  shape -classes  nai-rov;  peaicj  v/tde 
pea!:,  flat  replon^  and  lov/  major  peak^  respectively.  Trils 
Is  pi'chably  not  an  optlisurn  assle;n:asntj  but  on  this  baslSj 
raeasurenent  SH  has  an  F-ratlo  of  17.5* 

Measurei::cntG  of  second  and  third  central  moments  of  the 
hip;a  frequency  region  of  /^/  v/ere  also  tried,  v/lth  very 
little  success.  The  fricative  /s/  v'as  found  to  be  slnillar 
to  /a/ ,  but  vT.th  features  occurring  higher  in  irrequenoy, 
nearer  tiie  upper  limit  of  the  spectrum  analy2;Gr.  It  was  not 
formally  investigated. 


.  b  Vo:Lce  onset  time 


In  voiced  stops  follov.'ing  an  unvoiced  segsient,  the  on¬ 
set  of  voicing  before  the  release  of  the  stop  is  not  used, 
for  p.honetlc  dlst:lnction  In  Englls.h,  yet  it  is  not  uncommon. 
(This  Is  the  "voicebar"  observed  in  spectrograms.)  The 
speaker-specif ity  o.f  this  phenomenon  v/as  pointed  out  by  M. . 
I'iedress  (personal  cosimunication) .  It  v;as  examined  In  the 
single  example  of  this  context  in  the  data,  this  bond  in  sen¬ 
tence  6  (point  6c  in  Fig.  5).  A  binary  distinction  (pre- 
voiced  or  not)  sceraed  app.ropriate ,  since  t’le  durat:ion  of  pro- 
voicing  shoved  wide  intraspeakcr  variations,  and  j.t  is  dif- 


fi.cult  to  me; 

’  s  ui'  0  f )!•  s c  j,  s  e  ].y . 

A 

voicirjg  pj.''Oca. 

.;dod  the  bupot  by 

20 

from  each  of 

the  21  speakers. 

o  ; 
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t;'jac^  •!  cl.lc!  so  only  oc  cos  Iona  ll.yy  anu  11  never  cVJ.d .  Asa;i.a;n- 
iny^  the  value  of  ].  lo  prevoicccl  cxar.ples  me],  0  to  unvoiced 
ones,  cm  P-ratio  ol  1^1.3  vies  obtained  lor  measurcr.icnt  TREV. 

This  nioasurerent  is  particularly  appealing  because  It 
concerns  a  rapid  event  v'hlcii  Is  not  likely  to  be  conscious¬ 
ly  modified,  and  it  is  an  event  of  such  specificity  that  It 
is  prvobabiy  Independent  of  most  other  measurements.  It  Is, 
hoViCver,  dependent  on  yood  recording  conditions,  since  Ion 
frequency  background  noise  or  poor  lov;  frequency  response 
v;ould  make  this  measurement  ifiroosslble . 


3.7  Duration  of  "bought" 

As  a  single  example  of  a  mcasure.m.ent  of  speech  tiiuing, 
the  cluzoation  of  j^ugh.t  in  sentence  3  (points  3^  to  5f  iJ'i 
J'lg.  5)  '''fis  investigated.  This  me3>sui-eraent ,  lilze  the  last, 
is  dependent  on  learned  rather  than  organic  characteristics 
of  the  spealcer.  The  energy  function  of  a  v;ord  that  begins 
and  ends  valth  stop  consonants  rises  and  falls  sharply  during 
the  stopgaps,  so  the  measureraent  oj?  duration  is  a  siraple 
root  ter.  The  racasureraent  BAWT  vias  the  number  of  frames  (lO 
rasec  intervals)  betvzeen  the  half-anip3  Itude  points  of  the 
energy  function.  It  v:as  found  that  the  range  for  the  set 
of  speakers  vzas  not  large,  so  tlze  aVncliyiduad.  ranges  v;ore  not 
narre.;  v.'ith  respect  to  it.  In  addition,  the  narr-ouncss  of 
the  ranges  meant  that  the  10  msec  quantization  v;as  too  coarse. 
In  spite  of  these  fen'tors,  BAV/T  has  some  capability  for 


\ 


spealcor  sep^ii-ationj 


fO 

u 


KincG  an  of’  20.7  v;a3  obtained. 


^  ,  8  Coiiipan if; o n  of  ; ne o. eieG c roe n t  a 

The  ncasu2-c.aent 5  deGcribod  above  are  suixiarincd  in 
Tab].e  2j  ranked  3.n  order  of  F-ratjo.  Tlio  presence  of  nine 
r.’oaavrrofiients j  v.'ith  high  P-ratios  at  the  top  of  the  list 
does  not  nocessaril.y  mean,  they  should  be  inpleaiented  first 
in  a  speaker  recoa^nition  systeia^  since  they  are  likely  to 
be  heaviJ.y  depende.nt , 

To  give  some  crude  ejcanples  of  the  meaning  of  these 

P-ratlos_,  if  all  the  apea!:ers  had  normal  distributions  valth 

o 

equal  variances  of  ag  and  if  half  of  thera  v;ere  centered  at 
-O'  and  half  of  them  at  -]-c,  the  resulting  P-ratio  v.'ould  be 
lOj  if  they  fell  in  four  sroup3_,  v:ith  adjacent  means  sepa- 
rated  by  gog  the  F-ratio  mould  be  ^G. 

An  array  of  the  values  of  the  aP  statistj.c  for  all 
pairs  of  these  measurements  is  given  in  Table  3.  As  may  be 
expectedj  the  AP  values  for  the  measurement s  are  general¬ 

ly  much  greater  than  f^eroj  vmlth  the  notable  exception  of 
5P02  and  5F03.  Most  other  pairs  ^  such  3F01  and  PPJSVj  have 
much  smaller  values.  (Regarding  the  many  nasal  measui'cmcnts ^ 
It  vias  found  that  moasureraents  frora  adja.cent  channels  vjorc 
highly  dependent j  but  the  dependence  decreased  as  the  com¬ 
parison  progressed  to  more  distant  channels.  Goraparc  3M1 
vs.  3^-''5  and  3M1  vs,  3ml7.  )  Tiiore  is  presently  no  statistic¬ 
al  basis  for  setting  a  threshold  value  on  Alg  but  for 
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ACOUSTIC  MEASUISMSIITS 
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illuotrntive  purposes ^  all  valuer  gacatar  uhan  or  equal  to 
0,2p  ha-vc  been  clrclccl.  On  that  baais^  the  only  riK;a3u:cGniGnt 
palre  vjhlch  arc  soiaevtiat  dependent  are  the  1'^  measureiacnts j 
/vTvFd  and  tv/o  of  the  mcariureraentG ^  and  3111  one!  3^0.  Palr- 
v/lee  Independence  (in  the  looae  eense  v/e  hc!.vG  boon  using  tlie 
term)  does  not  guarantee  total  mutual  inctepondcnce ,  but  agaiiij 
it  v/ill  probablp'  suffice  for  the  purpose  of  c^.voi.ding  heavilp' 
redundant  raeasurenionts . 

In  03:'der  to  determine  v;hether  the  set  of  speakers  used 
in  this  ejiperiraent  v:as  bi.ased  by  the  inclusion  of  speech  re¬ 
searchers  ^  ;Lho  might  tend  to  speak  in  a  particularly  con¬ 
sistent  v;ay  that  v/ould  render  them  easy  to  identify^  the 
speaker  set  v;as  divided  into  tvjo  groups.  One  groui^j  of  10 
mer.il^ors v;ero  those  vjho  ucre  substantially  concerned  v;ith 
Bpeecli  .research*  the  other  group  of  11  V/ore  not.  For  eacl'i 
of  the  20  measu.rc.ments  listed  in  Table  the  relative  vari¬ 
ance  (i.e,  inuividua],  variance  divided  by  the  total  variance) 
for  each  speaker  v.ms  tabulated  and  averaged  .for  that  speaker. 
Th-cn  the  averages  of  each  group  V'erc  computed  and  compared. 
Th.ere  vjas  no  significant  diffcrerice  in  the  Identif lability 
of  the  tv;o  groups  as  given  by  the  relative  variance. 


3.9  Identification  results 

Although  the  actual  construction  of  v.  spea'cei-  recogn.i.- 
t^.on  systoiii  v/^.s  not  a  primary  aim  of  this  study,  the  tempta¬ 
tion  of  finding  out  v/hethor  tiiose  laeaourements  actually  "vrork' 
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v.'as  too  t;reci.t  to  roB:]ot.  Accorclinjjly  an  e],cr;ientary  speaker 
Iclentaf Ica.tion  silyorlthni  v:as  prograiamecl  in  POKTilAW  lY  on  the 
P.DP-9.  '.twenty  .niCciour-enente  v.'ere  selected  from  Table  2j 
choosinw  those  v/ith  the  hishest  P--ratiOj  but  rejecting  those 
\-/itri  signii'icant  dependence  (arbitra.rily ^  AV  >  0.28)  02c  the 
measurements  already  selected.  This  set  of  rneasureraents  is 
1  i  s  t  e  d  :i.  n  Tab  1  e  . 

The  dataj  vdiicii  consiskeed  of  ten  repetitions  by  each 
spealcer-j  v/as  partitioned  into  design  and  test  sets.  The 
design  set  v;as  used  to  rorni  references  for  each  speakorg  by 
calculating  the  mean  and  variance  for-  eacli  measure  .ment  for 
each  speaker'  the  test  data  v;as  used  to  test  t.he  effect¬ 
iveness  of  these  re.ferences  in  cliarac  ter  lining  the  individual 
spe£),I:er5 .  This  testing  x-jas  dons  v;it>i  data  that  had  no  role 
in  the  determination  of  the  references.  In  order  to  raake 
full  use  of  the  available  data^  each  of  the  ten  repetitions 
v;as  used  in  turn  as  the  test  set^  uhile  tiie  remaining  nine 
were  used  to  form  the  references. 

The  classification  algorithm,  vias  a  rainlnium  distance  pro¬ 
cedure  using  a  v:eightod  Euclidean  distance  metric  sinri.lar  to 


that  used  by  Pruaansky  and.  liathsv/s  (l9o'l).  If  r  measurements 
are  usocg  each  datum  is  represented  by  a  point  in  an  r-dilmen- 
Eional  space.  The  average  of  the  nine  repetitions  for  each 
speaker  in  the  design  set  is  the  centroid  of  those  nine  points 


The  square  of  the  distance;  bot'-eon  a  datui,;  x 


(j-.j  j  Xp  j  .  .  .  ;  ) 


and  the  centroid  of  the  j-th  class  "  {m--, -■  JJ---  M 


\ 
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’OR  IDEMTVFTC 

Jfeuia 

P-ratio 

1, 

5 1 ’02 

84.9 

2. 

5F03 

54.3 

3. 

AEP2 

46.6 

h. 

3M1 

43.4 

5. 

3!0.8 

4l.O 

6. 

UI'O.A 

-  J 

7. 

1U3 

34.4 

8. 

IS2 

32.7 

9. 

3N8 

32.5 

10. 

3M17 

24.8 

11. 

3N23 

24  .4 

12. 

API 

22.9 

13. 

31^123 

.  21.7 

14. 

UHFl 

21.1 

15. 

BAV.'T 

20.7 

16. 

AP2 

19.0 

17. 

SH 

17.5 

18. 

AEPl 

15.5 

19. 

PRiiV 

14.5 

20. 

AS2 

11.8 

Qh 


is  given  by 


dZ-Cx/i^^  ) 


r 


k-1 


o 

<0":,  > . 


2 

v?he3.'’e  <o';,  >.  is  the  average  speaker  va.r lance  fcnr’  the  k-th 
Ok  J 

me  a  surer, lent  ore  the  reference  o.cita.  Dividing  the  squared  dis¬ 
tance  In  Gcich  dimension  by  the  average  speaker  variance  v, 'eights 
it  according  to  the  average  narrov.'ness  of  the  individual 
spealcer  distributions.  Tlie  distance  to  the  centroid  of  each 
spealier  is  computed^  c,nd  the  test  datum  i,s  associated  v/ith 
the  speakrer  v.iiose  centroid,  is  closest.  Although  this  a.].go- 
rithra  i.s  non-probabilistically  motivatod ,  it  is  in  fact  the 
opti.mura  classification  procedure  for  spealcers  that  are  a 
prj.ori  equally  likely  and  measurements  that  are  independent 
Gaussian  random  variables  v;ith  equal  variances  for  each  speak¬ 
er  (Mil  3  son;,  1965). 

l/hon  the  first  17  measurements  in  Tcdoi.e  v;cre  usccg  no 
identification  errors  more  made  in  the  class j.fication  of  ].0 
repetitions  by  each  of  21  speakers.  If  the  measurements  are 
selected  by-  trial  and  ereror  rather  than  systema.tlcally  by  a 
priori  evaluation  as  v;as  done  here;  perfect  recognition  can 
be  achieved  v;lth  fcv:er  measurements;  since  the  P-ratio  Is 
not  an  optimizing  statistic,  Hov.'ever;  its  usefulness;  along 
v;ith  that  of  the  Av  statistic;  Is  demonstrated  by  the  succoss 
achieved  here  v;lth  a  compute' tlonalj.y  simple  classifier  and 
small  number  of  Gffoctlvo  iiieasuresients . 


r 


OIIApTt'll  6 
COIICLliSiOlI 


Th;1.3  study  has  been  directed  tov.'ard  tlie  iraprovcraont  of 
speal'cr  recognition  techniques  by  means  of  improving  the 
character:!. zinc;  measiuccments  raade  on  the  voice  signal.  The 
approach  adopted  here  ma'iCes  specific  measurer.icnts  on  speoch 
events  V/hich  have  boon  segiventcd  and  located  :'.n  the  utterauicc. 
The  choices  of  the  phonetic  segments  and  the  measuronients  made 
on  them  are  guided  by  considerations  of  vocal  tract  structure 
and  the  \-;ays  In  v.inlch  the  various  speech  vsounds  are  produced. 
The  final,  selection  of  racasinccmonts  Is  aided  by  techn:lquG‘S 
of  evaluating  the  speaher  separating  ability  and  the  Inter¬ 
dependence  of  the  measurements. 

For  the  conditions  of  this  experlniont_,  nieasurements  of 
fundamental  frequence'  proved  to  be  the  most  useful  single 
measurements  investigated.  They  v;ere  generally  interdepond- 
entj  so  otheig  independent  measurements  v;ere  usually  prefer¬ 
able  to  multiple  rfieasurer.ients .  Most  of  the  other  moasure- 
rnents  \jqi-q  not  heavl-ly  dependent  on  each  ot.hsr.  Nasals  mere 
characterized  by  cert<?.l,n  j.ndlviclual  fll.terc  outputs,  f'orraant 
measurements  of  the  vov/els  that  icere  stiidicd  mere  useful ,  as 
were  spcctrura  S-’rapo  par,an:etero  j,n  cases  i;herc  foivaant  loca¬ 
tions  v;erc  difficult  to  measure.  O-'lic  v/lder  intc.rspcal:cr 
variation  of  the  second  foricant  made  tliat  one  generally  bet¬ 
ter  than  the  f;i,rst  formant.  A  roug?!  estimation  of  the  glotlal 
source  spectrum  slope  V;as  also  effectlvo.  The  ln;rorimat3 on 
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convcyea  by  tlic  Kioasiireiaent::;  of  duration  of  the  v'oj^d 
shape  of  /s/  spcctruui;,  and  prevoicin^  v/as  liraitcd  by  their 
coarseness  of  quantization^  but  t]iey  also  proved  to  be  usefu 
The  validity  of  this  selective;  and  efficient  approach 
to  acoustic  measure^'ients  for  speaker  recognition  is  denonstr 
ed  by  the  success  achieved  :i.n  speaker  identif licat j.on  v/ith  a 
small  nuraber  of  such  moasuror.icnts  and,  a  simple  linear  class! 
fication  procedure,  A  direct  comparison  oi'  results  by  dif¬ 
ferent  uorkers  is  usually  not  possible  due  to  different  sets 
of  constraints  placed  on  the  problem.  Subjectively^  hovjever 
the  result  achieved  here  compares  very  favorably  v/ith  the 
repo3,-ts  in  the  .current  liter£'>ture . 

The  set  of  measurements  developed  ’here  cannot  be  called 
optimum^  since  only  a,  relatively  small  niU'iber  of  possible 
laeasureraents  were  investi2;ated .  E::tended  research  v.'i].l  pro¬ 
bably  produce  more  independent  acoustic  rae  a  sure  merits  with 
equivalent  P-ratios  at  least  In  the  ^!0's  and  50 '  s .  There  ar 
several  specific  areas  that  should  be  good  candidates  for 
such  extensions : 

1.  The  spectra  of  vov/els  should  yiield  more  useful 
data.  The  improvement  and  automatj.on  of  the  ana¬ 
lysis-by-synthesis  tocJinique  would  be  a  great  aid 
for  provldj.ng  fast;,  reliable  for;uant  maasurements , 

The  recently  introduced  chi3:“p  z-tran,yfoj.'va  algo- 
rutlim  (j  tTibinorp  ct  al .  ^  1969 )  sriould  also  bo  useful. 
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2.  The  ne, 3al  consonant y  should  bear 
analysis  so  thahe  a  raore  satisfactory 


a  closer 
laeans  of 


charactoihsiny  them  can.  be  devised.  As  j.ndicated 
in  Ciiapter  5;,  the  pole  and  zero  locations  are 
probably  the  slynlflcant  factors.  Analysis-by- 
syntheslsj  as  used  by  Tujlmura  (1962)^  If  amen¬ 


able  to  automation  in  this  more  complex  case,  may 
prove  useful  in  this  respect, 

3.  Furtbiej:-  liivestlgation  of  the  laryngeal  enscita- 
tlon  characteristics  should  be  done.  Acceptable 
automatic  Inverse  filtering  might  be  accomplished 


by  raeans  of  par¬ 
ly  s  Is -by- synthe 
may  also  be  cha 


ameters  derived  by  automated  ana- 
sis.  Perturbations'  In  pitch  period 
ractcristlc  of  individual  D.arynges. 


.  A  largely  untapped  area  is  that  of  temporal 
patterns  In  the  speech ’signal .  This  area  Includes 
effects  such  as  rate  and.  extent  of  forraant  transit¬ 
ions,  the  coordination  of  different  articulators , 
and  durations  of  certain  segments.  An  interesting 
proble;a  is  that  of  norviiallz ing  te:.iporal  patterns 
for  the  rate  of  speech.  Temrior-al  patterns  arc  ad¬ 
mittedly  more'  difficult  to  characterize  than  somo 
spectral  pattex'-ns,  bu.t  they  must  contain  ranch 
infoi.aaation  about  learned  charactcid.stic s . 

Both  present  and  future  measuromonts  must  be  subjoctod 
to  close  scrutiny  in  terms  of  their  stability  v/itii  respect 
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to  time  and  the  state  of  health  of  the  speaker.  Tlio  influence 
of  the  eraotiona].  state  oi'  a  speaker  and  the  effect  on  a  per¬ 
son’s  learned  characteristics  of  movinc;  to  a  repplon  vdiere  a 
different  dialect  is  spoken  arc  not  kno’.vn.  The  susceptibili¬ 
ty  of  spealcer  , recognition  measureraent s  to  voice  miraicry  and 
disguise  should  also  be  investigated.  There  is  cilso  a  need 
for  simultaneous  investigation  of  vocal  tract  anatoiay  and 
acoustic  characteristics  of  different  speakers. 

Another  spealcer  recogiiition  paradlgin  vjhlch  has  already 
arisen  in  lav;  enforcem.ent  situations  May  be  called  uncontrol¬ 
led  speaker  verification.  The  only  acoustic  evidence  avail¬ 
able  is  tv;o  cpoecii  samples j  and  the  only  question  is  vihether 
they  were  uttered  by  the  same  spociker.  '  3!n  this  case_,  it  may 
not  be  possible  to  foriii  a  set  of  reference  patterns  from 
many  repetitions  of  utterances.  Different  techniques  may  be 
required. 

The  raea  sure  merits  described  here  v/ere  done  on  phones  in 
singlGj  fixed,  contexts.  This  method  can  be  eventual.ly  ge¬ 
neralised  to  any  context^  or  at  least  to  a  subset  of  con-. 
tex;ts.  The.n  it  is  conceivable  that  future  automatic  speaker 
recognisors  with  advanced  speech  recognition  capabili.ty  v;ill 
be  able  to  extract  the  necessary  measureiments  froni  ai^bitrary 
context . 
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APPEJ^IJUX  I 

EQUIVALENCE  OE  AVERAGE  RELATIVE  VARIANCE  AND  E-l^ATIO 


Let  X.  .  denote  the  nieasureri'ient  datun  of  the  1-th  repetition 

3-  J 


k 


by  the  j-th  epcal'erj  j=l  ^2^  .  .  .  jHi.  Let  <  > 

2 

denote  the  avera^^e  over  the  subscript  k.  Let  /J.  and  o'. 

J  J 

denote  the  mean  and  variance  of  the  data  of  the  j-tli  speaker, 

—  2  ■ 

Lot  AJ  and  denote  the  mean  and  variance  of  the  data  pooled 

over  all  the  speakers . 

Let  a  denote  the  average  relative  variance ^  and  let  P 
denote  the  P-ratio. 


a 


<  0-2  >. 
J _ J  J 

, 

tot 


n[  Var(/y^. )  ] 

j,'  „ 

<  O' .  >  . 

J  J 


-  «  x,^>,  ^  a2  >.  - 


J  J 


Hence 


i.  2  •— 2 

<  X.  .  >.  .  ~  <  o  .  y .  -  ju^ 

IJ  I.J  0  J 


“  <  <  V  >J 


.-2  /  -2  . 

0 ,  <  o  .  > . 


E  = 


pt  A  >-il 

^  >7"“ . “ 


==  n(  i  ..  1) 


C4 
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Since  the  P-raiJ.o  ia  a  monotonic 
relative  vca-ic<.nco,  ivinhiiig  iueaaux'er.icn 
of  P-ratioo  is  equivalent  to  ranhinq 
of  averase  relative  variances. 


.function  of  the;  average 
to  in  decreasing  order 
theni  .i.n  .increasing  ox'-der 
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oUi'IMAi'iJ.E 


APPKADiA  II 
A  OP  nPASUHEr-iEPP.r; 


cent'; 

i.ns 

S  U-i  il’  -  I'dllT'  j  .  G  jS 

of  the  i.nd 

bed  i 

.n  Ci; 

icipter  5. 

Each  surmna 

..  . 

<  O  -L 

the 

u.  c  L/  0  .C'  ci  n  c  G 

by  each,  of 

crs.  ''TotfJ.  sl^nia"  is  the  estiiaated  standard  deviation  of 
ti:ie  data  pooled  over  all  speakers.  "Pisigi.ia"  j.s  the  ratio 
of  the  individual  estiiacitecl  standard  deviation  to  the  pooled 
standard  deviation^  oj.-  the  square  root  of  the  relative 
variance . 

For  the  purposes  of  format  or  representation  as  integer 
variables^  certain  mcasureiucnts  vjore  subjected  to  translat¬ 


ion  or  scale  change. 


le  following  raeasureiaonts  had  1000  Hz 


subtracted  from  them:  UHFOj  AEF2j  and  AP2 .  The  follov.'ing 
nioasurenionts  nere  multiplied  by  10:  IS2;,  IU3j  AS2^  AU3_,  and 


U?.ilA 
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MF-ASlI-iRr'.F.'jT  JFai 


TOTAL 

PAN  GE ; 

6  5 

TOTAL 

Si GMA; 

13.43 

AVG  . 

SI  GM.;  :  4 

.83 

AVG. 

RSI GMA ;0 

.363 

F-RATIO:  61 .3 

SPKR 

RA  NGE 

MEAN 

S!  GMA 

RSI  GMA 

1 

13 

IZ8.6 

2.95 

3.223 

2 

7 

1  1  .  3 

2. '••'6 

0.168 

5 

13 

121.2 

3.94 

3.293 

4 

14 

113.0 

5.47 

3. 438 

5 

38 

1  1  4. A 

11.34 

3.822 

6 

1  1 

9  1.6 

3.57 

0.36  5 

7 

17 

139.1 

5.23 

3.387 

8 

13 

103. B 

3.77 

3.280 

9 

1  3 

1  33.2 

3.22 

0.242 

13 

15 

109.9 

5.34 

3.398 

1 1 

15 

112.1 

4.61 

3.3  43 

12 

21 

137.4 

5.56 

3.4  14 

13 

14 

113.3 

4.37 

0.325 

M 

IJ 

115.5 

4.43 

3.333 

15 

21 

131.9 

5.B8 

3.438 

16 

13 

143.6 

4.22 

3.512 

1  7 

1  1 

94.7 

3.59 

3.267 

18 

13 

117.4 

4.36 

0.332 

15 

26 

112.7 

7.01 

3.522 

20 

17 

102.0 

5.16 

3.384 

21 

19 

122.4 

5.82 

0.433 

MEASUKEMtNV  JF'.'2 

TOTSL  RANGE-  76 
TOTAL  SIGMA;  IJ.S'J 
AVG.  SI GMA  ;  A.7P 
AVG.  RSI GMA  JS 
F-RATIO;  71.2 


'KP 

RANGE 

MEAN 

SI  GMA 

RSI  GMA 

1 

8 

115.3 

2.71 

3.195 

2 

8 

113.5 

2.83 

3.202 

5 

19 

123.3 

5.8A 

3.424 

4 

15 

1  12.8 

4.29 

3.309 

5 

25 

t  18.7 

7.96 

0.573 

6 

17 

9  1.7 

5.31 

0.361 

7 

9 

107.3 

3.55 

3.256 

8 

8 

102.7 

2.71 

3.195 

9 

1  1 

129.3 

5.63 

C.2S5 

10 

7 

117.2 

2.2  3 

3.  159 

1  1 

17 

112.5 

4.62 

3.333 

12 

18 

146.3 

6.95 

0.53  1 

13 

19 

1  16.5 

5.42 

0.393 

1  4 

15 

118.7 

5.19 

3.374 

15 

24 

103.2 

8.73 

0.627 

16 

12 

145.8 

5.6S 

3.265 

17 

19 

103.3 

5.81 

0.4  19 

18 

17 

1  13.3 

5.03 

0.362 

19 

9 

117.0 

3.71 

0.267 

20 

10 

104.9 

3.14 

0.226 

21 

20 

127.7 

5.62 

3,435 

MEASUREMENT  JFfI3 


total 

RANGE: 

99 

TOTAL 

SI GMA ; 

1  7.33 

AVG. 

SI  G.MA  :  8 

.  1  7 

AVG. 

RSI GMA  ;3 

.471 

F-RATlO:  33.9 

SPKR 

RANGE 

KEAN 

SI  GMA 

RSI  GMA 

1 

21 

135.7 

6.77 

3.390 

2 

19 

143.3 

6.4  1 

0.373 

3 

19 

154.5 

7.47 

0.431 

4 

29 

128.9 

A. 40 

0.485 

5 

61 

158.8 

18.24 

1.053 

6 

24 

123.3 

7.10 

3.4  10 

7 

U 

132. A 

3.7? 

0.215 

8 

32 

133.5 

8.77 

0.506 

9 

28 

155.2 

9.10 

3.525 

10 

18 

145.3 

6.31 

3.364 

1  1 

21 

131.6 

7.29 

0.421 

12 

53 

179.? 

15.35 

0.868 

13 

36 

140.2 

12.73 

0.7  35 

14 

22 

135.6 

6.31 

3.364 

15 

25 

134.2 

8.12 

3.469 

16 

1  7 

169.8 

5.73 

3.53  1 

17 

24 

133.1 

7.65 

C  .  44  1 

18 

28 

133.9 

8.80 

0.533 

19 

16 

141.3 

4.67 

0.269 

20 

19 

112.8 

5.09 

0.294 

21 

22 

1  43.2 

7.79 

0.449 

MEASUREMENT  JF2A 

TOTAL  RANGE:  7a 
TOTAL  SIGMA;  15.»,1 
AVG.  SI GMA ;  6.74 
AVG .  RSI GMA  r2.i65 
r-RATIO;  51.3 


SPKR 

range 

MEAN 

SI  GMA 

RSI GMA 

1 

14 

117.3 

4.60 

0.29  1 

2 

7 

137,8 

2.49 

0.157 

i 

25 

1.34.4 

8.4  1 

0.532 

A 

12 

124.6 

4.01 

0.253 

5 

1  4 

131  .? 

5.67 

0.359 

6 

8 

102.2 

2.62 

0.165 

7 

30 

123.5 

13.66 

0,674 

8 

7 

109.3 

2.83 

0.17? 

9 

39 

150.5 

13.83 

0.875 

10 

16 

1.29. 1 

4.00 

0.266 

1  1 

15 

113.7 

4.35 

0.275 

12 

18 

152.2 

6.03 

0.3  79 

13 

24 

127.5 

7.23 

3.4  60 

14 

6 

115.6 

2.01 

0.127 

15 

32 

116.4 

12.12 

0.767 

16 

22 

149.8 

6.23 

3.394 

17 

16 

102.3 

4.64 

0.294 

18 

12 

126.2 

3.79 

0.240 

19 

22 

120.9 

6.69 

0.423 

20 

13 

103.9 

3.63 

0.230 

21 

12 

127.6 

4.45 

0.282 

S3 
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MEASUREMENT  5F?5 

TOTAL  RANGEr  71 
TOTAL  SIGMA;  lA.SA 
AVG.  SIGNir  5. A? 
AVG.  RSIGMA-:'.^G5 
F-RATiO;  52.8 


SPKR 

RANGE 

MEAN 

SIGMA 

RSI GMA 

t 

17 

I2A.  1 

5.07 

8.3A  1 

2 

8 

132. G 

2 . 9A 

2.193 

3 

1? 

131.9 

5.5  7 

0.3  75 

A 

6 

113.3 

1  .83 

0.123 

5 

25 

139.1 

S.20 

0.552 

6 

1  7 

111.8 

5.75 

0.387 

7 

1  1 

117.1 

3. At 

0.230 

8 

13 

112.0 

A  .  00 

0.2  70 

9 

39 

150.7 

12.31 

0.S63 

10 

15 

132.6 

A  .A5 

0.303 

1  1 

12 

12  0.0 

A.  16 

0.281 

12 

21 

158.2 

8.08 

0.5AA 

13 

21 

128.5 

6.38 

0.A6A 

lA 

10 

120.2 

3.63 

0.2A3 

15 

32 

120.1 

11.70 

0.78? 

15 

8 

lAl  .8 

3.0,8 

0.233 

17 

13 

I  06.9 

A.  12 

0.273 

18 

16 

121  .6 

5.25 

0.353 

19 

17 

130.0 

6.11 

0.AI2 

20 

8 

1  0A.A 

2.A1 

0.  163 

21 

13 

132.6 

A. 27 

0.288 

MEASUNFCIENT  5F2  1 


TOTAL 

,  RANGE 

:  125 

TOTAL 

,  SIGMA 

r  23. 7A 

AVG. 

SI GMA  ; 

7.  A  1 

AVG. 

RSI GMA 

;0.3  12 

F-RATIO-  51 

.0 

SPKR 

RANGE 

MEAN 

SI  GMA 

RSI  GMA 

1 

19 

137.0 

5.72 

0.2A  1 

2 

10 

IA9.9 

3.51 

0.  IAS 

3 

A5 

167.  5 

I3.A9 

0.56R 

A 

15 

137.3 

5.52 

0.232 

5 

29 

192.9 

8.35 

0.35  1 

6 

2.3 

131.0 

7.07 

0.293 

7 

15 

13A.0 

5.33 

0.225 

8 

19 

lAI  .2 

5.73 

0.2A  1 

9 

27 

17A.5 

9.20 

0.388 

10 

28 

162.7 

8.35 

0,352 

1  1 

18 

137.8 

6.07 

0.256 

12 

36 

2IA.0 

12.06 

0.508 

13 

I  7 

IAA.0 

5.81 

0.2A5 

lA 

9 

1  A5.8 

3.A6 

0  .  I  A  6 

15 

33 

135.3 

10.11 

0.426 

16 

3a 

182.  A 

10.55 

0  .  A  A  A 

1  7 

28 

128.  1 

7.99 

0.337 

18 

A3 

I  A0.5 

IA.3A 

0. 60A 

19 

17 

156.9 

5.65 

0.237 

20 

7 

127.5 

2.  12 

0.059 

21 

15 

150.0 

5.25 

0.221 

MEASUREMENT  5FE2 


TOTAL 

RANGE- 

126 

TOTAL 

SI  GMA  : 

2A.06 

AVG, 

SI  GMA  : 

7.4A 

AVG. 

RSI GMA: 

0.309 

F-RATIO:  BA. 

9 

SPKR 

RANGE 

MEAN 

SI  GMA 

RSI GMA 

1 

17 

I4A.3 

A. 73 

0.197 

2 

10 

152.5 

3.87 

0.161 

5 

A5 

170.7 

12.37 

C  .  5  1  A 

A 

16 

133.5 

5.70 

0.23  7 

5 

57 

198.0. 

1  0.2S 

3.427 

6 

23 

I3A.  1 

6.79 

0.282 

7 

lA 

137.7 

4.76 

0.198 

8 

21 

1  A2  .  Q 

6.5A 

0.2  72 

9 

28 

177.  1 

9.50 

0.395 

10 

2  6 

165.3 

7 .  Qa 

0.332 

1  I 

IS 

1  A  3 . 2 

7.25 

0.301 

12 

37 

217.5 

12.88 

0.535 

13 

19 

IAA.9 

6.33 

0.263 

lA 

18 

152.9 

5.22 

3.217 

15 

27 

IA0.5 

8.77 

0.365 

16 

29 

1  B  7 . 6 

S.95 

0.372 

17 

2.7 

I  28. 9 

7.77 

0.323 

18 

A0 

IA3.0 

13.83 

3.575 

19 

15 

160.  1 

5.35 

0.22.A 

23 

8 

129.0 

2.A5 

0.102 

21 

13 

15A.5 

A.  79 

0.199 

MEASUREMENT  5E03 

total 

RANGE: 

60 

TOTAL 

SI GMA  : 

12.67 

AVG. 

SI  GMA  I  A 

.66 

AVG. 

RSI GMA  :  3 

.36  7 

F-RATlO;  5a. 5 

SPKR 

range 

MEAN 

SI  GMA 

SSI GMA 

1 

10 

109.1 

3.21 

■0.253 

2 

13 

12  1.9 

3.95 

0.512 

3 

12 

110,3 

4.22 

'’.353 

4 

6 

98.6 

1.90 

0.  150 

5 

I  6 

133.0 

4.99 

0.39A 

S 

5 

83.2 

I  .40. 

0.110 

7 

I  I 

I0A.6 

3.5  0 

0.2  76 

3 

21 

94. 6 

6.06 

0.478 

9 

21 

I  18.5 

6.06 

0.478 

10 

13 

9«  .2 

3.74 

0.295 

1  1 

15 

107.4 

A  .  65 

0.367 

12 

33 

120.5 

9.97 

0.787 

13 

21 

110.5 

6.36 

0.502 

lA 

9 

1  10. A 

2.9  1 

0.230 

15 

2  5 

100.8 

8.05 

0.635 

16 

13 

134.9 

A.  72 

0.3  73 

I  7 

26 

92.  A 

8.72 

0.683 

1 8 

9 

105.9 

2.69 

0.212 

19 

9 

107.8 

3.03 

0.24  3 

20 

8 

91.6 

3.20 

0.253 

21 

12 

1  18.  A 

A. 40 

0.347 

MF.ASUREMKM  ;)FiM 


TOTAL 

RANGE 

:  72 

TOTAL 

SIGMA 

:  1A.8A 

AVG . 

SI  GMA  : 

A. 95 

AVG. 

RSIGMA 

:0.552 

F-RATIO:  SQ 

.  5 

SPKP 

RA  NGE 

mean 

SI  GMA 

RSI GMA 

1 

8 

116.0 

2.93 

0.201 

2 

S 

151.5 

2. '>6 

0,152 

5 

1  J 

12?.  ? 

A.  78 

2.522 

A 

8 

.  1  1  O.Q 

2.92 

0.19  7 

5 

5A 

1  AS.  A 

10.55 

0.6R7 

6 

19 

105. 

5.59 

0,576 

7 

1  A 

. 109.A 

A. 62 

0.5  12 

8 

9 

106.9 

2.77 

0.  IBS 

5 

l'5 

155.8 

A.  1  8 

0.282 

1  a 

27 

12A.0 

?.A1 

0.565 

1 1 

16 

111.0 

A. 22 

0.28A 

12 

27 

15A.5 

3. 6A 

0.582 

1  J 

12 

122.8 

5.91 

2.265 

lA 

1  1 

115.5 

5.56 

0. 2A0 

15 

25 

111.2 

6.A5 

2.  A5A 

16 

20 

156.7 

5.70 

0 . 58  A 

17 

-  15 

102.  1 

5,A7 

0.568 

18 

25 

12A.7 

7.50 

0.505 

19 

9 

119.7 

2,75 

0.185 

20 

9 

101.5 

2.52 

0.  156 

21 

12 

126.9 

A. 20 

0.285 

MEASUREMENT  AEEO 

TOTAL 

RANGE: 

82 

TOTAL 

SI GMA  : 

18.12 

AVG. 

SIGMA:  6 

.20 

AVG. 

RSI GMA  rO 

.5a2 

E-RATIO:  72.2 

SPKR 

RANGE 

MEAN 

SI  GMA 

RSI GMA 

I 

1  7 

121,5 

5.85 

0.525 

2 

19 

158.0 

5.57 

0.297 

5 

12 

1A8.8 

A.  15 

0.228 

A 

1  1 

125.7 

A. 22 

0.255 

5 

25 

166,  1 

6.  1  7 

0.5A1 

6 

18 

1  lA.e 

6.0A 

0.555 

7 

1  6 

123,  1 

A. 95 

0.275 

8 

27 

I2A.  1 

7.92 

0.A57 

9 

59 

159.  A 

11.15 

0.615 

10 

25 

155.2 

6.99 

0.586 

1  1 

15 

12A.8 

5.91 

0.216 

12 

27 

158.6 

9. 17 

0.506 

15 

19 

127.9 

6.57 

0.55  1 

lA 

26 

122,7 

7.29 

0.A02 

15 

27 

127.5 

8. AS 

0.A58 

16 

16 

167.0 

5.6A 

0.51  I 

17 

12 

110.7 

5.55 

0.  ISA 

18 

12 

121,2 

5,85 

0.215 

19 

17 

150,7 

6.86 

0.5  79 

20 

21 

1  10.2 

6,A9 

0.558 

21 

16 

155.9 

6.08 

0.556 

MEASUREMENT  JMl 

TOTAL  RANGE:  18 
total  SIGMA:  5.fi4 
AVG.  SI GMA:  1  ,  SA 
AUG.  RSIG.'>1A:0.A2A 
E-RATIO:  4J.A 


SPKR 

RANGE 

MEAN 

SI  GMA 

RSIGMA 

1 

5 

16.5 

0.97 

0.267 

2 

6 

21.0 

1  .9A 

0.55a 

5 

5 

20. A 

1.17 

0.525 

A 

5 

22.8 

I  .  05 

0.28A 

5 

2 

16.8 

0.65 

0.  1  7A 

6  * 

5 

19. A 

2.A6 

0.575 

7 

A 

21. 1 

1.29 

0.55a 

8 

A 

21.2 

1.55 

0.A26 

9 

5 

15.  A 

1.07 

C.2'55 

1  a 

6 

18.7 

1  .89 

e.519 

1  1 

5 

18.  1 

1.63 

0.  A58 

12 

9 

1A.5 

2.A5 

0.57a 

13 

5 

15.1 

1.85 

0.509 

1  A 

5 

12.2 

1  .A0 

0.58A 

1  5 

6 

15.  1 

1.66 

0.A57 

1  G 

6 

19.5 

1.89 

0.519 

1  7 

A 

15.  1 

1.10 

0.502 

18 

5 

1A.5 

1  .5A 

0.568 

19 

5 

20.8 

1 .  1  A 

0.5  12 

23 

7 

21.7 

2.51 

0.65  5 

21 

6 

19.7 

1.6A 

0.A5O 

MEASUREMENT  5M6 


TOTAL 

RA  NGE: 

22 

TOTAL 

SIGMA : 

5.95 

AVG. 

SIGMA:  1 

.06 

AVG. 

RS!G,MA:C 

.  A98 

E-RATIO:  28. A 

SPKR 

RANGE 

MEAN 

SI  GMA 

RSIGMA 

I 

7 

5.  1 

1.85 

0.A71 

2 

3 

6.1 

0.99 

0.255 

5 

5 

2.8 

1  .87 

0.A76 

A 

3 

7.3  . 

0.82 

0.209 

5 

5 

9.0 

1.15 

0.29A 

6 

7 

A. 5 

2.A2 

0.61A 

7 

9 

A. 7 

2.71 

0.639 

8 

7 

7.6 

2.07 

0.525 

9 

7 

1.7 

2.00 

3.509 

10 

6 

-0.7 

2,36 

0.600 

1  I 

6 

5.2 

2.15 

0.5A7 

12 

1  1 

-2.5 

5.  17 

0.606 

1 5 

9 

1.0 

2.71 

0.688 

lA 

5 

2.7 

1.16 

0.295 

15 

9 

-  1.2 

2.97 

0,756 

1  6 

7 

5.5 

1.95 

0.A95 

1  7 

5 

-A. 6 

1.07 

0.275 

18 

5 

-  1  .A 

1.58 

0.AO1 

1  9 

A 

5.0 

l.Al 

0.559 

2  0 

S 

.  A.  1 

2.56 

0.650 

2  1 

7 

0.9 

2.15 

0.5A2 

I 


MEASU 

REMENT 

JM  I  7 

TOTAL 

RANGE 

;  30 

TOTAL 

SIGMA 

-  5.52 

AVG  . 

SI GMA  ; 

2.84 

AVG. 

RSI GMA 

:  0 . 5  1  5 

F-RATIO;  24 

.8 

SPKR 

RANGE 

MEA  N 

SI  GMA 

RSI  GMA 

1 

5 

-0.6 

2.27 

0.412 

2 

6 

1  .0 

2.00 

0.3  63 

3 

3 

8.  1 

1.10 

0.  l'>9 

4 

7 

18.5 

2.27 

0.412 

5 

8 

18.7 

2.71 

0.49  1 

6 

8 

4.5 

2.72 

0.493 

7 

9 

1  .9 

2.85 

0.516 

B 

6 

9.7 

2.54 

0.461 

9 

6 

9.9 

2.13 

0.386 

10 

10 

6.0 

3. 06 

0.554 

1  1 

12 

1  1  .5 

3.60 

0.652 

12 

17 

7.1 

4.82 

0.873 

13 

7 

9.0 

2.62 

0.476 

lA 

5 

6.4 

1.58 

0.236 

15 

20 

8.2 

6.27 

1.136 

16 

1  1 

4.6 

3.20 

0.581 

17 

6 

6.0 

2.05 

0.372 

18 

8 

9.1 

2.56 

0.464 

19 

1 1 

5.4 

3.7g 

0.685 

20 

10 

9.6 

3.41 

0.617 

21 

6 

4.2 

2.10 

0.330 

MEPSUREi'iEM  iN8 

TOTAL  RANGE;  ?.2 
TOTAL  SIGMA;  4.56 
AVG.  SIGMA;  2.15 
AVG.  RSIG.‘'iA;P.47| 
F-RATIO:  J2.5 


PXR 

RANGE 

MEAN 

SI  GMA 

RSI  GMA 

1 

5 

-4.3 

1.42 

0.3  1  1 

2 

6 

-1.7 

1.70 

0.3  73 

3 

4 

-3.  1 

1.20 

0.262 

4 

10 

-  1.3 

2.71 

0.594 

5 

4 

6.9 

1.52 

0.334 

6 

12 

3.5 

3.57 

0.782 

7 

6 

-4.  1 

1.97 

0.432 

8 

7 

4.2 

2.57 

0.564 

9 

5 

-4.7 

1 .49 

0.323 

10 

7 

-4.6 

2.07 

0.453 

1  1 

8 

3.0 

2.40 

0.527 

12 

4 

-5.3 

1.34 

0.29  3 

13 

7 

-3.9 

2.13 

0 .467 

14 

3 

1  .  1 

1.10 

0.24  1 

15 

16 

-2.4 

4.09 

0.896 

1  6 

5 

-6.2 

1.55 

0.34  0 

1  7 

8 

-8.2 

2.74 

0.601 

18 

7 

3.8 

2.44 

0.535 

19 

9 

-0.9 

2.96 

0.649 

20 

7 

1.5 

2.27 

0.498 

21 

5 

-5.9 

1.85 

0.406 

MEASUREHFNT  JM2i 

TOTAL  RANGE;  JR 
TOTAL  SIGMA;  5.87 
AVG.  SIGMA;  J.  I  1 
AVG.  RSI3MA;2.5Jf' 
F-RATIO:  21.7 


SPKR 

RANGE 

MEAN 

S 1  GiM  A 

RSIGMA 

1 

5 

-6.9 

1.52 

0.259 

2 

1  0 

-2.7 

3.5'> 

0.611 

3 

4 

11.7 

1.J4 

0.228 

4 

8 

9.9 

2.64 

0.450 

5 

5 

4.  1 

1.97 

0.33  5 

6 

10 

1.3 

3. 06 

0.520 

7 

1  1 

3.0 

3.74 

0.53  7 

8 

12 

0.2 

3.52 

0.599 

9 

1  I 

-0.4 

3.69 

0.528 

10 

1  I 

3.6 

3.57 

0.507 

1  1 

14 

2.4 

4.35 

0.741 

12 

16 

-4.7 

4.62 

0.786 

13 

8 

-2.2 

2.74 

0.467 

14 

9 

-1.0 

2.67 

0.454 

15 

26 

1.3 

7.85 

1  .333 

16 

9 

2.9 

3.  1  I 

0.529 

1  7 

5 

-0.2 

1.75 

0.298 

18 

9 

-4.6 

2.45 

0.419 

19 

8 

8.8 

2.62 

0.445 

20 

9 

8.5 

2.80 

0.4  76 

2  I 

6 

-2.7 

1.77 

0.301 

MEASUREMENT  JNIB 

TOTAL  RANGEr  28 
total  SIGMA:  5.60 
AVG.  SIGMA:  2.45 
AVG.  RSIGMA  ;  0.4  JR 
F-RATIO:  41.0 


PKP 

RANGE 

MEAN 

SIGMA 

RSI GMA 

1 

5 

-9.2 

1.76 

0.315 

2 

6 

-4.1 

1.85 

0.33  1 

3 

5 

1.4 

1.71 

0.306 

4 

1  1 

7.6 

3.06 

0.547 

5 

7 

4.3 

2.06 

0.368 

6 

4 

-4.6 

1.26 

0.226 

7 

9 

-0.6 

3.13 

0.560 

8 

9 

-4.5 

2.46 

0.440 

9 

8 

3.  1 

2.33 

0.4  16 

10 

8 

-3.4 

3.37 

0.603 

1  1 

10 

3.3 

2.57 

0.477 

12 

10 

0.7 

2.79 

0.499 

13 

9 

4.9 

2.42 

0.433 

14 

4 

-9.6 

1.35 

0.24  1 

15 

12 

-0.  1 

3.57 

0.63  8 

16 

5 

r2.8 

1.62 

0.289 

17 

7 

-3.2 

2.30 

0.411 

18 

12 

10.2 

3.52 

0.629 

19 

10 

5.6 

3.24 

0.579 

20 

7 

3.4 

2.37 

0.423 

21 

11! 

1.2 

2.66 

0.475 

G 


KEASVJPEMrNT  JtJSi 


MEASU,RE^iEST  IS2 


^6 


TOTAL 

RANGLr  29 

TOTAL 

RA  NGEr 

92 

total 

SIG'At  S.IA 

TOTAL 

S!  G,“,A  : 

1  A,27 

AVG. 

SlG^lAr  3.19 

AVG. 

SI  GMA ;  6 

.51 

AVG  . 

RSIG;'iAr0.9  19 

AVG. 

RSIGMA70 

.A55 

r-RATIO;  2A.A 

R-RATICr  32.7 

SPKR 

RANGE  MEAN 

SJ  o  r.  A 

RS]  G>;A 

SPKR 

RANGE 

riEAN 

SI  G’nA 

RSIGMA 

1 

8  -3.  A 

3.03 

3.A92 

1 

13 

I3A.2 

A.  S3 

0.333 

9 

8  -  7.0 

2. 3  6 

0. 33A 

2 

35 

135.6 

11.25 

0.757 

j 

1  1  2.8 

A.  IS 

0.S77 

3 

2A 

166.  1 

7.  A3 

0.521 

A 

10  -2.1 

3.28 

0.53  A 

A 

27 

lOA.?. 

7,69 

0.539 

5 

9  -S.2 

2,SS 

0.A33 

5 

1  1 

133,3 

3.33 

0.2  3, 3 

6 

7  0.0 

2.21 

0.360 

6 

18 

133.5 

5.59 

0  .  A  I  3 

7 

M  A. A 

5.3A 

0.320 

7 

22. 

13A.2 

6. 58 

0.A61 

8 

13  -S.S 

5.72 

0.605 

8 

22 

138.1 

7.98 

0. 559 

9 

10  -5.9 

3.15 

0.517 

9 

15 

123.7 

5.A6 

0.385 

10 

12  -8.2 

3.85 

2.627 

10 

2A 

136.  1 

6.62 

0.A6A 

1  1 

A  -7.1 

1.A5 

0.236 

1  1 

30 

1  A  6 . 9 

11.08 

0.777 

12 

12  -7.S 

2.90 

2.  A  72 

12 

A2 

1A3.S 

12.26 

0.868 

13 

8  -5.S 

2.72 

0.AA2 

13 

23 

1AB.5 

5.55 

0,461 

M 

A  -5.9 

1  .A5 

0.236 

lA 

13 

133.1 

A ,  6  1 

0.323 

19 

15  -1.0 

5.16 

0.SA0 

15 

22 

159.9 

6.59 

0.A62 

IS 

A  -6.2 

1.69 

3.27A 

16 

19 

137.6 

5.10 

0.353 

17 

6  -9.0 

2.26 

0.363 

17 

10 

13A.5 

3.AA 

0.2A1 

18 

12  9.8 

A.  A  2 

0.719 

18 

35 

132.5 

11.12 

0.779 

19 

19  5.5 

6.03 

0.939 

19 

1  1 

119.3 

3.2.9 

0.251 

20 

9  2.1 

2.77 

0.A53 

20 

7 

138.9 

2.6A 

3.155 

21 

13  -5.5 

2.SA 

0.  A29 

21 

la 

1A2.2 

2.90 

0.205 

MEASUREMENT 

11.5 

MEASUREMENT  AS2 

TOTAL 

RANGE; 

35  6 

total 

RANGE: 

124 

TOTAL 

SI  G.MAt 

77.96  ■ 

TOTAL 

SIGMA: 

15.43 

AVG. 

SI  GMA ;5 

5.65 

AVG. 

SIGMA:  8 

.79 

AVG. 

RS] CMA  - 

0.A58 

AVG  . 

RSIGMAr? 

.570 

E-RATIG:  5A. 

A 

E-RATIO:  11.5 

SPKR 

RANGE 

MEAN 

SI  GMA 

.RSI  GMA 

SPKR 

RANGE 

MEAN 

SI  GMA 

RSI GMA 

I 

95 

-65.7 

29.12 

0.374 

1 

23 

84.2 

6. 56 

0.445 

2 

155 

-225.5 

42. '54 

0.551 

2 

48 

73.7 

1  1  .74 

0.761 

5 

160 

-  156.1 

49.92 

0.640 

3 

23 

56. 9 

7,49 

0.486 

4 

70 

13.2 

22.92 

0.255 

4 

19 

9  1.6 

6.36 

0.412 

5 

95 

-75.3 

35)77 

2.459 

5 

26 

80.7 

9.08 

0.559 

6 

1 82 

-2.A3.5 

53.26 

0.A?7 

6 

41 

84.9 

11.81 

3.765 

7 

116 

-239.3 

36.47 

8.4  63 

7 

16 

84.3 

5.06 

0.325 

8 

215 

-  72. A 
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